Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
An analysis of building component energy usage: a data driven approach to formulate a guideline
(USC Thesis Other)
An analysis of building component energy usage: a data driven approach to formulate a guideline
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Copyright [2021] Muntaseer Khan
AN ANALYSIS OF BUILDING COMPONENT ENERGY USAGE
A DATA DRIVEN APPROACH TO FORMULATE A GUIDELINE
by
Muntaseer Khan
A Thesis Presented to the
FACULTY OF THE USC GRADUATE SCHOOL OF ARCHITECTURE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF BUILDING SCIENCE
MAY 2021
ii
ACKNOWLEDGEMENTS
This thesis research was accomplished by the dedication and encouragement of several
individuals.
First and foremost, I would like to express my sincere gratitude and respect to Professor Joon-Ho
Choi, for his infinite patience and supervision throughout the whole research process. Secondly, I
would like to thank to Professor Marc Schiler, Karen Kensek and Yao-Yi Chiang for sharing
their vast knowledge and useful insight that helped throughout journey of this thesis. My
gratitude also goes to Dr. Okheon Lee, for his prior exceptional work in gathering the data that
was used in this research. These individuals shaped and guided my thesis and have forever
inspired me to do more.
I would like to give special thanks to my Masters in Building Science colleagues for helping me
throughout my two excellent years at USC.
Finally, I would like to thank my family who supported me throughout my life.
COMMITTEE MEMBERS
Joon-Ho Choi
Assistant Professor
USC School of Architecture
joonhoch@usc.edu
Marc Schiler
Professor
USC School of Architecture
marcs@usc.edu
Yao-Yi Chiang
Associate Professor (Research)
Spatial Sciences Institute
yaoyic@usc.edu
Dr. Okheon Lee
Ministry of Trade, Industry and Energy, Korea
iii
Table of Contents
ACKNOWLEDGEMENTS ....................................................................................................................... ii
ABSTRACT ................................................................................................................................................ vi
Chapter 1: Introduction ............................................................................................................................. 1
1.2 Building Energy Consumption baseline .................................................................................... 3
1.3 Data Analysis Methods ............................................................................................................... 4
1.4 Machine Learning ....................................................................................................................... 5
1.5 Research Outcome ............................................................................................................................ 6
1.7 Software ....................................................................................................................................... 7
1.8 Key Terminology ......................................................................................................................... 8
1.9 Chapter 1 Summary ................................................................................................................... 9
2.1 Relevance of Building Components in Energy Consumption ..................................................... 10
2.1.1 Examining the Role of Building Envelope for Energy Efficiency in Office Buildings in
India ................................................................................................................................................... 10
2.1.2 Passive building energy savings: A review of building envelope components .................... 11
2.1.3 Building Energy Performance Parameters ............................................................................ 12
2.2 Current Energy Standards ............................................................................................................. 13
2.2.1 ASHRAE and IECC comparison ............................................................................................ 14
2.2.2 Energy Consumption Comparison Experiment .................................................................... 17
2.3 Creating an Energy Guideline/Baseline ........................................................................................ 20
2.3.1 Steps to Develop a Baseline: A Guide to Developing an Energy Use and Energy Intensity
Baseline .............................................................................................................................................. 20
2.3.2 Building Energy Use Benchmarking (Baseline) .................................................................... 21
2.4 Data Driven Analysis and Machine Learning .............................................................................. 22
2.4.1 A review of data-driven building energy consumption prediction studies ......................... 22
2.4.2 Machine Learning Approach .................................................................................................. 24
2.5 Chapter 2 Summary ....................................................................................................................... 26
3.1 Workflow Diagram ......................................................................................................................... 27
3.2 Data Analysis Strategies ................................................................................................................. 28
3.2.1 Stepwise and Forward Regression .......................................................................................... 28
3.2.2 Decision Tree & Random Forest ............................................................................................ 29
iv
3.3 Data Analysis and Software ........................................................................................................... 33
3.3.1 SPSS .......................................................................................................................................... 33
3.3.2 WEKA ....................................................................................................................................... 33
3.4 Validation & Results ....................................................................................................................... 35
3.4.1 Simulation ................................................................................................................................. 36
3.4.2 Results Comparison ................................................................................................................. 37
3.5 Chapter 3 Summary ....................................................................................................................... 38
Chapter 4: Data Analysis and Building Component Relations ............................................................ 39
4.1 Data Categorization and Observation ........................................................................................... 40
4.2 Regression ........................................................................................................................................ 40
4.2.1 SPSS Software Setup Overview .............................................................................................. 40
4.2.2 Regression Output.................................................................................................................... 42
4.2.3 Regression Results Key Factors .............................................................................................. 45
4.3 Machine Learning ............................................................................................................................... 46
4.3.1 Weka Software Setup Overview ............................................................................................. 46
4.3.2 J48 Decision Tree Results ........................................................................................................ 49
4.4 Parameter Ranking ............................................................................................................................. 52
4.4.1 Regression Results.................................................................................................................... 52
4.4.2 Decision Tree Ranking............................................................................................................. 53
4.4.3 Ranking according to an expert .............................................................................................. 57
4.4.4 Limitations ................................................................................................................................ 58
4.5 Chapter Summary .............................................................................................................................. 59
Chapter 5: Simulations and Validation .................................................................................................. 60
5.1 Design Builder Models Criteria ..................................................................................................... 60
5.1.2 The Simulation Components ................................................................................................... 62
5.2 Simulation Results .......................................................................................................................... 70
5.2.1 Simulations for Ranking Systems ........................................................................................... 71
5.2.2 Simulation Results Analysis .................................................................................................... 73
5.3 Individual Components Improvement Based on Decision Tree ................................................. 74
5.3.1 Model Simulations Criteria ..................................................................................................... 75
5.3.2 Simulation Results and Analysis ............................................................................................. 78
5.4 Research Outcome and Limitations .............................................................................................. 79
5.5 Chapter 5 Summary ....................................................................................................................... 81
v
Chapter 6: Conclusion and Future Work ............................................................................................... 82
6.1 Research Summary ......................................................................................................................... 82
6.2 Evaluation of the methodology and improvement to current workflow .................................... 86
6.3 Limitations ....................................................................................................................................... 87
6.4 Research Applicability .................................................................................................................... 87
6.5 Future Work .................................................................................................................................... 88
6.5.1 Using Other Machine Learning Algorithms .......................................................................... 88
6.5.2 Implement Other Variables and Units ................................................................................... 88
6.5.3 Automation and Speed Improvements ................................................................................... 89
6.5 Summary .......................................................................................................................................... 89
REFERENCES .......................................................................................................................................... 91
APPENDIX ................................................................................................................................................ 94
vi
ABSTRACT
Although there has been widespread use of building performance simulation and energy
management technologies, buildings continue to raise environmental and energy resource issues.
Considering our goals for Architecture 2030, the push for finding new innovative strategies to
lower carbon emissions is important now more than ever. Considering also that the existing
building energy guidelines are backdated and generalized and also lack coordination between
them which created confusion, an update in this regard would be useful.
The research follows a data analysis approach using large amounts of existing building data to
explore the relationships of building components and it’s impacts on energy usage. Using both
statistical and machine learning techniques, building components are ranked according to their
energy consumption impacts.
Prioritizing these critical components, a more tailored building energy guideline is to be
formulated. Models are simulated using building simulation software to validate this argument.
The proposed guideline should clear current standards and propose a more specific and efficient
guideline.
KEYWORDS: Building Energy Guideline, Building Components, Statistical Analysis, Machine
Learning, Building Simulation
1
Chapter 1: Introduction
Buildings continue to raise environmental and energy resource issues. In 2019, buildings
consumed 21 quadrillion British thermal units (BTU) which was equal to 28% of the total U.S.
energy consumption of the year (U.S. Energy Information Administration, 2020). Because
buildings use so much energy, it’s contribution to climate change is vast. Therefore, it is crucial to
address this through the built environment and paving a way to a more carbon neutral future. At a
special event hosted by USC, the renowned architect Edward Mazria spoke about Architecture
2030, which is a non-profit organization, founded in 2002, whose sole purpose is to transform the
built environment from a major contributor of greenhouse gas to a more central part of the solution
to the climate change crisis (Architecture 2030, 2015). The 2030 challenge was established which
meant that architects and the building community should adopt necessary practices to make sure
that all new buildings, development, and major renovations are carbon neutral by 2030. Not only
is it important to strive to use low carbon footprint materials and equipment, but it is equally
important to challenge the overall energy management strategies over the buildings’ lifespan.
1.1 Building Energy Performance
To pursue the building energy reduction objective, the building energy performance estimation
has become a very attractive approach. This includes the measurement and benchmarking of the
whole building energy consumption. This strategy can direct the schematic design in the early
stages of design and also help evaluate existing buildings for potential retrofits. Building energy
is influenced by several factors including building envelope, local climate, building activities and
internal energy systems. Among these, the building envelope, although remains as an elegant
component that helps shape the architectural aesthetic, is also a crucial factor in determining the
2
energy performance. As described in Institution of Structural Engineering (1999), “the building
envelope is described as the climate moderator and is the first line of defense against the impact
of the external climate on the indoor environment.” Not only does the envelope affect the heating
and cooling loads, but it also plays a key role in the interior artificial lighting. This gives us a
valuable factor to consider in the realm of building energy simulation. Figure 1 summarizes impact
of sustainable building envelope design on building sustainability. To help further the studies in
building energy simulation and conservation, this research is set to explore the building envelope
material’s energy consumption sensitivity.
Figure 1.1: Influential Factors on Building Energy Performance [9]
3
1.2 Building Energy Consumption baseline
The baseline model is an essential prediction used for identifying energy savings in energy
retrofitting projects or energy management program. With the aim of addressing and reducing
energy consumption in a facility, the building energy consumption baselines shall be considered
and a potential one will be suggested. Creating the baseline for current energy consumption will
assist both the stakeholders and the design team in evaluating the building energy performance as
well as understanding the energy expenditures associated with the building operation costs. By
identifying the high-performance facilities, the recognition and replicable sustainable practices
benefit. It is the starting point for setting the energy efficiency improvement goals as well as
providing a comparison point for assessing future efforts and trending overall performance. For
instance, the 2030 Challenge established by Architecture 2030 uses the average or median energy
consumption of existing U.S. commercial buildings reported by the 2012 Commercial Building
Energy Consumption Survey as its baseline for the target goals (Architecture 2030, 2015).
Energy use intensity (EUI) is the key metric used for energy consumption baseline. It is the
building energy use as the function of the building size, normally square footage, with the unit in
kBtu/sf yr. The annual EUI is obtained by dividing the total annual energy consumption by the
total gross floor area of the building (Energy Star, 2016). Buildings with different internal principal
activities have different EUIs, for example, hospitals have relatively higher EUI since there are
large amounts of testing and inspection instruments, which consume higher electricity loads.
Generally, the lower the EUI value, the better the building energy performance. Figure 3 shows
the median source EUIs in terms of different building types derived from Portfolio Manager and
the Department of Energy’s nationally representative Commercial Building Energy Consumption
Survey (CBECS).
4
Figure 1.2: Median Source EUI for Different Building Types [5]
In this research, building site EUI was selected for buildings energy use estimation study. There
are different ways to predict the building EUI with different levels of accuracy. Estimating and
modeling the building EUI precisely, especially in the community or urban level, is an essential
process for future energy benchmarking and urban energy infrastructure planning (Ma & Cheng,
2016).
1.3 Data Analysis Methods
Stepwise regression
It is a model which involves the step-by-step iterative variations of a regression model where
different independent variables are compared and chosen to be used in the final model. This
includes adding or removing potential variables. For each iteration, the statistical significance is
tested. The forward selection method starts with no variables and adds each new variable
incrementally, testing for statistical significance.
5
Decision Tree
The decision tree is one of the most widely used data mining methods. A decision tree is a flow
chart diagram that segregates sets of data into predefined classes, thereby providing categorization,
description and generalization of given datasets. Each branch of the decision tree represents a
possible outcome. The branches at the ends represent the end results.
1.4 Machine Learning
It is an application of artificial intelligence (AI) which has the ability to automatically learn and
improve by using existing data. This subject area consists of a set of techniques that can
automatically uncover patterns in data, and then to predict future data. The three main types of
machine learning methods are: supervised learning, unsupervised learning, and reinforcement
learning. Supervised learning methods can discover a mapping from inputs to outputs, provided
that the input-output pairs are specified. For unsupervised learning methods, while only the inputs
are given, it can recognize noteworthy patterns in the data. Unlike supervised learning, the
distinctions are not clear, so there is less confidence of what patterns to look for, and there is no
specific measure of error. Lastly, the less commonly used reinforcement learning is useful for
learning how to perform when given occasional reward or penalty signals.
It is impossible to process energy calculations for each building when the scale is increased. This
is where data-driven science plays its part. Learning from existing building data, it can make
generally reliable predictions. The calculation and prediction of building energy consumption is
essential for energy planning and conservation. Data-driven models provide a practical approach
for prediction of building energy consumption. [17]
6
Figure 1.3: ML types and hierarchy of common practices. [17]
1.5 Research Outcome
The primary objective of this research is to provide the stakeholders, being energy distributors
and/or building owners with a reliable energy diagnostic methodology to assess their building
energy performance. This includes the establishment of a new building energy guideline by
understanding and analyzing trends in building data in order to establish critical building
components that should be prioritized. This approach can lead to an effective benchmark for
energy consumption which will contribute to building performance efficiency, whilst increasing
sustainability. They did the integrated energy master plan for William S. Hart Union High School
District and Cerritos Community College District with over 200 buildings in total. These two
projects are in the Greater Los Angeles area with two different California climate zones. Each
building dataset comes with the annual and monthly fuel usage breakdown in electricity and
natural gas as well as the detailed monthly energy consumption end use in heating, cooling, fan,
lighting and miscellaneous sources. The annual and monthly EUI will be easily obtained for this
research.
The façade features which may influence the building energy performance will be investigated to
determine the key performance indicators. Typical climate factors in these two different climate
7
zones will be summarized, for example the heating degree days and cooling degree days. different
statistical methods will be employed to conduct the data mining process, in order to determine the
correlation between building EUI and façade features. The dataset will be classified into the
training dataset and the testing dataset for cross-validation purpose. Multivariable linear
regression, classification and artificial neural network are the three primary approaches chosen to
analyze the data. By cross evaluation and case study, the accuracy of the result will be tested. The
major outcome of the research will be an improved baseline model or formula derived for building
EUI as a function of façade features and the typical climate factors.
1.7 Software
Design Builder
It is a building energy simulation software where buildings, to a certain detail extent, and can be
simulated to output a wide range of reports and outputs for comparison of performance design options.
Weather data can be easily implemented for a more accurate representation.
8
Weka
Weka is a data analysis and prediction software that uses machine learning algorithms for
training and prediction. It includes a suite of useful machine learning algorithms that can be used
to analyze large data sets and output results relatively quickly.
SPSS
A statistical software that includes a set of programs for data analysis. The primary purpose of this
software is to analyze large sets of scientific data that is used for market research, data mining,
etc.
1.8 Key Terminology
• Energy Use Intensity
Energy use intensity expresses the energy usage per square footage, with the unit in kBtu/sf yr.
The EUI calculation is done by dividing the total energy consumption, within a period, by the
gross floor area of the facility.
• Energy Performance Benchmarking
It is a measurement of a building’s energy performance over time. It is compared with similar
buildings or simulations of a referenced building that had been built to a specific standard or
building code.
• Data-Driven Model
Models that build relationships between input and output data using statistical analysis and
machine learning techniques
9
• Linear Regression
It is a data processing method to establish relationships between a dependent variable and one
independent variable
• Multivariable Regression
It is a data processing method to estimate the linear relationship between several independent
variables and one dependent variable
• Classification
Classification is a data processing method for separating data into different categories, also known
as sub-populations, and then linking data vectors to these categories
1.9 Chapter 1 Summary
To save considerable energy in the sector of buildings is the ultimate goal. Strategizing our energy
planning and monitoring the patterns is one of the key aspects to that goal. The applications of
such a research will help understand how different components of a building impact energy
consumption. Establishing a new guideline in accordance to the key parameters will help
policymakers, designers and building owners for better strategic planning for future buildings. The
existence of many completed and ongoing research about this field is a testament to its importance.
Each concept available is with their own methodology and set of tools. But each with its own array
of limitations. The next chapter illustrates only some of the relevant work that had been done.
10
CHAPTER 2: Background and Literature Review
2.1 Relevance of Building Components in Energy Consumption
Building energy usage is affected by several critical components of the building. It is necessary to
explore and understand where these components stand in the face of building energy consumption
in order to prioritize on which of those to improve.
2.1.1 Examining the Role of Building Envelope for Energy Efficiency in Office Buildings in
India
The overall energy usage of a building is the sum resultant of all its building component factors.
However, to understand the role of each component is quite important. In a study, conducted by
Farheen Bono and Mohammad Arif Kamal [21], an investigation was done about various building
components to understand their effect on heating and cooling loads. This was done by a thorough
investigation of several commercial buildings across different climate zones in India.
Building data suggested that HVAC accounts for most of the highest energy consumption in
commercial buildings. The HVAC demand is directly proportional to the internal heat gain through
the building envelope, equipment, lighting, occupancy patterns and air filtration.
An investigation on building orientation revealed that in warm climates, the northeast-southwest
faced greater energy consumption than the northwest-southeast, north-south and east-west
orientations. And the recommended orientation would be northwest and southeast, and the shape
should have an aspect ratio 1:2 and width 15m, so the heat transfer is kept at a minimum while
taking advantage of the daylight maximum.
11
The building envelope is what determines the amount of solar radiation and wind that enters a
building, thus affecting the HVAC requirements dramatically. It was seen that for single story
buildings, the maximum heat gain occurs through the roof and in multi-story buildings it occurs
through the walls and windows.
The rate of heat flowing through a material is determined by the material's R-value or metric
U-value (watts per square meter-kelvin, W/(m^2.k)). In India, the Energy Conservation Building
Code (ECBC) is used most widely and it caters to the wide range of climatic changes of the
different regions in India. The code recommends U-values for walls to be 0.077, and for roof to be
0.0459 [21]. To improve fenestration, ECBC Recommends a Wall-window ratio of less than 60%
and to reduce solar heat gain suitable shading devices such as vertical louvers and horizontal
projections and heat resistive glazing. Their investigation showed dramatic improvements for the
use of shading devices. Investigation of different types of glazing proved that the U-value of SHGC
of Low-E glass is the minimum while the visible Light Transmittance (VT) is high and is therefore
recommended for office buildings. By incorporating such changes in the parameters, the energy
savings potential ranges from 17 to 42%.
2.1.2 Passive building energy savings: A review of building envelope components
A review study conducted by Suresh, Srikanth and Robert also explores how different building
envelope options effect energy efficiency. It reviews several research journals to concur that using
better materials for the various envelope components can lead to significant energy savings [1].
A research was done to study the thermal and heat thermal performance of 64 buildings in
subtropical Hong Kong. The DOE-2 building energy simulation tool was used and it revealed that
an effective building energy envelope design can potentially save as much as 47% of peak cooling
12
demands [2]. The use of ventilated walls showed that a cooling energy savings of 40% [3]. A
different simulation study, conducted by Singh and Garg, overserved 10 different glazing types in
five different climate zones in India. The results showed that the annual energy savings are not
only dependent of thermal conductivity (U-value) but also the building orientation, climate
conditions and building parameters [4]. It was observed that using roof shading options, cool roof
paints and coatings, or compound roof systems, the heat transfer coefficient of the roof (with roof
area of 2240 ft
2
) can be reduced from 0.581 Btu/h*ft
2
.F to 0.095 Btu/h*ft
2
which would improve
performance significantly [5]. It was determined that light weight roofs of aluminum with lighter
colored roof paints such as white, off-white, brown and green yielded 9.3%, 8.8%, 2.5% and 1.3%
reduction in cooling loads [6]. A detailed energy analysis of ventilated roof buildings in Italy
proved that during the summer, energy can go up to 30% [7].
Therefore, when the building envelope elements are improved, the energy savings can be
significant. Not only can a reformed energy guideline (ASHRAE or IECC) but creating one tailor
made for different climate conditions as well as according to their corresponding impacts will be
a proper step towards the net-zero ambition.
2.1.3 Building Energy Performance Parameters
The International Energy Agency (IEA) has been working on this for four decades. According to
The Annex 53 project established by the IEA, there are six main factors that determine building
performance include climate, building envelope, building systems, operations and maintenance,
occupant behavior and indoor environmental conditions. [22] When estimating the building energy
performance via modeling software, these key parameters should be appropriately selected for
accurate results.
13
A study about construction characteristics of existing commercial buildings was conducted [15].
64 buildings in sub-tropical Hong Kong were surveyed for analysis. The results indicated that
different envelope constructions fir equate to substantial differences in external heat gains and
cooling energy requirement. It was concluded that a building with a more effective envelope can
reduce the cooling requirements by as much as 35% compares to a poor performing envelope.
Climate data analysis involves the interpretation of the annual pattern of the climatic factors
influencing the indoor thermal comfort of the building. Key climatic factors include local
temperature, humidity, solar radiation, wind speed, etc. In his research, Gugliemetti showed the
significant role of climatic factor in forecasting the energy consumption [16].
Observations and Limitations
From the studies, the importance of building components is depicted quite vividly. It shows
recommendations for office buildings based on simple simulations in the Design Builder software.
The second study showed by how significantly the overall building energy use can be improved
by tweaking and simulating building envelope components. The third study farther reinforces this
idea by evaluating a large number of buildings. The lacking in these studies is the comparative
aspect of it. It does not delve into comparing building component factors to see which one has
higher effects.
2.2 Current Energy Standards
Energy standards has been an effective way to push the movement towards a greener future.
Standards like ASHRAE and IECC provide an informative guide to designing future buildings.
14
Such guides are used throughout the world, so they are quite generalized and accommodating. In
this section we shall explore the differences between two very used building energy guides.
2.2.1 ASHRAE and IECC comparison
Existing building energy codes establish baseline requirements and govern building construction
methodology [8]. In the United States, IECC and ASHRAE are most commonly employed in the
planning process of buildings. ASHRAE and IECC, though not homogenous, provide similar
guidelines and requirements for specific building aspects. Annually updating them allows for up-
to-date sustainability and innovation. However, the presence of multiple baselines and
requirements, so frequently updated, can be misleading and confusing for users. This results in
buildings that often do not meet the requirements and must compensate in other ways.
This section of chapter 2 reviews articles and journal entries that discuss the problems that arise
from having baseline options to choose from.
Comparison
In the paper presented by U.S. Department of Energy’s Building Technologies Program titled
“Comparison of Standard 90.1-2010 and the 2012 IECC with Respect to Commercial Buildings”,
the authors discuss key differences between the two baselines with respect to specific building
system types. A few prominent differences are listed in Table 2.1.
15
Table 2.1: Some ASHRAE vs. IECC differences
There are evident differences between the definitions and requirements of ASHRAE and IECC.
The fact that spaces are considered as zones by ASHRAE provides a more thorough application
of heated space considerations. IECC does not differentiate between the semi-heated and heated
spaces, therefore providing a more general baseline.
The categorization of spaces in ASHRAE is more general in comparison to the specificity of
IECC’s categorization of spaces. ASHRAE has a higher chance of defining a space as residential,
whereas IECC’s baseline requirements would specify the occupancy and purpose of the space
before categorizing it as residential or commercial.
IECC primarily considers above grade walls for its calculation of percentage of glazing. ASHRAE
Standard 90.1 considers both above and below grade walls. ASHRAE 90.1’s baseline claims that
for each space-conditioning category, the vertical fenestration area cannot exceed 40% of the gross
wall area.
Wall: that portion of the building envelope, including opaque area and fenestration, that is vertical
or tilted at an angle of 60 degrees from horizontal or greater. This includes above- and below-grade
16
walls, between floor spandrels, peripheral edges of floors, and foundation walls. For the purposes
of determining building envelope requirements, the classifications are defined as follows:
Above-Grade Wall: a wall that is not a below-grade wall.
Below-Grade Wall: that portion of a wall in the building envelope that is entirely below the finish
grade and in contact with the ground.
Gross Wall Area: the area of the wall measured on the exterior face from the top of the floor to the
bottom of the roof.
2012’s IECC guidelines state that the vertical fenestration area should not exceed 30% of the gross
wall area, but only when considering above-grade walls.
The clear lack of compliance between definitions of above-grade and below-grade walls also
causes a split between ASHRAE and IECC. [7]
The lack of coordination between wall considerations is further amplified when considering the
WWR limitations. For example, since ASHRAE Standard 90.1 considers walls as both above-
grade and below-grade, a building under Standard 90.1 will reach is 40% slower than a building
under IECC due to the total wall area being considered. Even though ASHRAE allows 40% as
opposed to IECC’s 30%, a building under ASHRAE might still have a lower WWR than an IECC.
The limitations of this distinction in definition also provides ambivalence when making decisions,
and decisions can be made based on which standard baseline provides more freedom for the
designers and construction workers. Therefore, here lies the potential to pursue a more unified
building code compliance approach.
17
2.2.2 Energy Consumption Comparison Experiment
Due to the varying differences between existing guidelines and their annual updates, many
experiments have been conducted by employing each baseline’s requirements on a building
simulator to test the energy consumption improvements. These were recorded as percentage
values. One such experiment was conducted in the thesis paper “Comparison of ASHRAE
Standard 90.1, 189.1 and IECC Codes for Large Office Building in Texas” from Texas A&M
University.
The purpose of this study was to show the lack of consistency between the different baselines. For
ASHRAE, standards from 1989, 1999, 2004, 2007, 2009 and 2010 were tested. IECC 2009 and
2012 were considered for this experiment. ASHRAE Standard 90.1-1989 was considered as the
base-case and three climate zones within Texas County were tested.
The following table displays the results concluded from the thesis experiment:
18
Table 2.2: Baseline Improvement Percentage
This experiment was first conducted for site energy consumption. The results displayed in Table-
2 and indicates the ambivalence and lack of consistency when it comes to annual improvements.
It is evident that ASHRAE Standard 90.1-2009 provided more of an improvement than ASHRAE
90.1 – 2010. The confusion that these results cause interfere with basic decisions that need to be
made for building construction. [12]
BASELINE PERCENTAGE IMPROVEMENT ABOVE
BASE-CASE
ASHRAE Standard 90.1-1999 19.6% -21.1%
ASHRAE Standard 90.1-2004 25.5%-33.7%
ASHRAE Standard 90.1-2007 30.3%-35.0%
ASHRAE Standard 90.1-2009 51.4%-57.0%
ASHRAE Standard 90.1-2010 43.1%-51.3%
IECC 2009 30.0%-36.4%
IECC 2012 38.1%-41.0%
19
Observations and Limitations
In the United States, the International Energy Conservation Code (IECC) is in use or adopted in
49 states, the District of Columbia, the U.S. Virgin Islands, NYC and Puerto Rico. While ASHRAE
Standard 90.1 is adopted is the US and all around the world. In the website article “ASHRAE 90.1
Vs. IECC — Is it Time to Take a Stand?”, the implementation of these building codes is discussed
in detail. The article states that within the United States, the availability of choices causes
confusion and a dilemma for every building project. [9] Though IECC has been adopted by most
jurisdictions, important institutions such as the U.S. Green Building Council still enforce
ASHRAE 90.1.
The previous sections demonstrate strong parallels between the language of ASHRAE 90.1 and
IECC. Though they have the same parameters, they have different requirements. Between the two,
it is clear that they both have different results for building code requirements and energy
performance. The fact that they continue to be published separately is concerning as their
coexistence creates confusion in the planning process of projects.
IECC is a typical legal design requirement, however, one of IECC’s pre-requisites is the
consideration of ASHRAE 90.1 and this opens the design process up to speculation when
considering which baseline to comply with. This burden on practitioners of the building sciences
is an unnecessary one, and neither ASHRAE or IECC are likely to renounce their positions as the
primary national energy code baselines.
One solution to this problem could be that a singular unified standard be published that would
satisfy and surpass the current standard(s). This would reduce uncertainty in the initial planning
process of buildings. The creation of a single code would result in better buildings. Another
20
approach would be to conclude one value after comparing the two that should be complied with
and would satisfy and surpass the requirements of both the available baselines.
2.3 Creating an Energy Guideline/Baseline
It is imperative to understand the methodology of establish a guideline before updating it.
Baselining is the act of measuring energy use and energy intensity at a determined level of detail
for the purpose of establishing a benchmark for future comparison to itself. [13]
2.3.1 Steps to Develop a Baseline: A Guide to Developing
an Energy Use and Energy Intensity Baseline
To assess the various parametric models that are proposed in
this study, understanding how to create a baseline model is
critical. There exists a guide to show how to approach this
task in a methodological manner intended for development
companies. It goes into detail to cover all the bases of
creating a baseline. [13]
It is necessary to set geographic boundaries within which this
baseline applies. It is encouraged to choose the appropriate
baseline year according to the reliable data that is available.
Energy records should be gathered according to the selected
baseline year preferably. Selecting units of output for comparison is essential. The metric, Energy
Use Intensity (EUI) is defined as the energy use per square unit (kBtu per sqft), is the most
generally used metric for comparisons. Calculation of this metric based on grouping and categories
Figure 2.1: Steps to develop a baseline [13]
21
in important as is to track the overall EUI changes. Companies should do so as per their focused
needs. Lastly, the changes to the energy usage by fuel type should be tracked. Steps to Develop a
Baseline: A Guide to Developing an Energy Use and Energy Intensity Baseline and the Reporting
Requirements for the Save Energy Now LEADER Pledge (Figure 1) [13].
2.3.2 Building Energy Use Benchmarking (Baseline)
The annual operational energy expense of commercial and government buildings are more than $2
per square foot in the United States (U.S. Department of Energy). This energy cost and social cost
has raised much awareness around the importance of energy management. By making energy
performance more accessible and measurable, not only would it further drive new investment but
also create approximately 5 to 15 green jobs per $1 million invested (U.S. Department of Energy,
2016). Building energy benchmarking is the approach to evaluate the building performance and
establish the comprehensive energy reduction goal, which has already become a common practice
across non-residential building markets.
As defined by U.S. Department of Energy,
“Benchmarking is the practice of comparing the measured performance of a device, process,
facility, or organization to itself, its peers, or established norms, with the goal of informing and
motivating performance improvement. When applied to building energy use, benchmarking
serves as a mechanism to measure energy performance of a single building over time, relative
to other similar buildings, or to modeled simulations of a reference building built to a specific
standard (such as an energy code).”
22
Benchmarking is a process to compare building performance to data in the same realm. For
instance, the data of building energy use from the same time last year and building performance
of similar facilities (Energy Star, 2016). This comparison plays an important role for state or local
government, property owners, facility manager or designers to facilitate energy management.,
evaluate the energy performance and assess energy saving opportunities. “a simple method to
inform decision makers with a relative energy performance level by comparing the whole- building
energy performance index of the assessed building with pre-set benchmarks”.
2.4 Data Driven Analysis and Machine Learning
2.4.1 A review of data-driven building energy consumption prediction studies
Amasyali and El-Gohary made notable observations in their review paper about different data-
driven approaches for building energy consumption prediction. It reviewed scopes of prediction,
data processing methods and machine learning algorithms for prediction and performance
measures for energy evaluation [10].
The data driven approach, unlike a physical simulated model that requires detailed building and
environmental parameters, learns from existing building data for predictions. The methods
reviewed are machine learning algorithms, support vector machines (SVM) and artificial neural
networks (ANN), decision trees and other statistical approaches.
Development of a data driven model mainly includes four steps: data collection, data
preprocessing, model training and model testing.
23
For data collection, it should be noted that a very small dataset will produce more inconsistencies
in the result, whereas a very large dataset would require more computational power to process.
The majority of the reviewed studies utilized one month to per year long datasets.
After the data is acquired, the step of data processing would involve data cleaning, data integration,
data transformation and data reduction. Data integration means to combine multiple data from
different sources, while data transformation involves changing the format of the data for the
machine learning algorithm to digest it. Lastly, data reduction would reduce the dimensionality of
the dataset which would increase processing efficiency.
Once the data is prepared, it would be used in the machine learning algorithm for training. Now,
the most commonly used techniques in the field of building energy are ANN and SVM. While
decision trees have been seldom used to understand their performances, the key metrics are
coefficient of variation (CV), mean absolute percentage error (MAPE) and root mean square error
(RMSE).
Figure 2.2: Formulae of CV, MAPE and RMSE [10]
24
It was noted that some of the most impactful factors on the accuracy of the prediction results is the
lack of data and/or the complexity of energy use behavior. It was also observed that data driven
models do not usually perform well outside the training data range. This, however, can be
countered by having a wider range of data with more variation. Another limitation is the fact that
these models are “black-box” models, meaning that their internal details like parameters are not
known. A hybrid (or gray-box) model that utilizes both physical and data-driven prediction models
would yield better results. In a research done by Dong et al [11] concluded that using hybrid models
offered performance improvements.
The key takeaway from this study is for understanding the current stance of data-driven approaches
in the realm of building energy modeling.
2.4.2 Machine Learning Approach
Machine learning techniques have high potential in helping understand building energy
performance better. This includes singular as well as large scale building data analysis. Naturally,
the advancement in machine learning algorithms have opened new potential options to assess
building energy. Furthermore, as more building data is gathered, more machine learning
techniques can be applicable. The following information was gathered from the review paper by
Fathi. [17]
Currently the 14 best approaches to machine learning in the space of Building Energy Performance
Forcasting are as follows: Artificial Neural Networks (ANN), Support Vector Regression (SVR),
Multiple Linear Regression (MLR), Genetic Algorithms (GA), Random Forests (RF), Cluster
Analysis (CA), Bayesian Networks (BN), Gaussian Processes (GP), Gradient Boosting (GB),
25
Principal Component Analysis (PCA), Deep Learning (DL), Reinforcement Learning (RL), Auto-
Regressive Integrated Moving Average (ARIMA), and Ensemble Prediction (ENS) [17].
Other studies compared the capacity of various machine learning methods in urban level building
performance. Deng et al [18] compared ANN, SVR, RF, and GB for forecasting Energy Use
Intensity (EUI) of commercial office buildings in the United States. The study also considered
individual energy end-uses of HVAC, lighting, and electric loads, based on the 2012 Commercial
Building Energy Consumption Survey (CBECS). It was concluded in this study that SVR and RF
showed better accuracy in building energy performance comparing to ANN and GB.
There were more studies that categorized buildings’ functionality, characteristics, and
consumption patterns, to efficiently forecast urban building energy performance. One of the
studies (An, J., Yan, D., & Hong, T., 2018) used clustering and statistical analyses to represent AC
use patterns for over 300 residential buildings in Zhengzhou, China, using building Key
Performance Indicators (KPI).
A study conducted by Azadeh [19], proposed the use of Artificial Neural Network (ANN) to
improve industrial energy usage consumption and analysis. The study concluded that ANN offers
great potential for long-term energy saving predictions. Support Vector Machine (SVM) is another
technique used for consumption analysis [20]. Although this technique had high prediction
accuracy, it would only work with small, less detailed datasets which makes it less accurate.
Observations and Limitations
According to the studies reviewed, implementing the machine learning and data mining methods
in BEPF can result in higher levels of accuracy, and faster and more organized computation for
26
building energy prediction. There are several strategies that use machine learning in the context of
building energy prediction of singular buildings or a group of buildings. The general goal remains
the same-studying patterns to understand energy usage relationships and improve usage efficiency.
2.5 Chapter 2 Summary
Upon reviewing this background research, several conclusions were drawn. The importance of
building envelope components and their corresponding energy impacts should not be overlooked
and should be given higher priority in accordance to energy usage. The discrepancies between
different building standards used (mainly ASHRAE and IECC) cause confusion and doubt. Given,
there is sufficient building data information, machine learning techniques can be implemented to
better evaluate the energy usage patterns. Therefore, considering these points, a potential
methodology can be developed for creating a better tailored building energy code in line with a
building’s location that would surpass the existing base energy standards used at that location.
27
Chapter 3: Methodology & Techniques
This chapter explores the procedure of work of this research study. To formulate a new guideline,
the first task is to assess the impact of different building components on energy usage which is
done by statistical analysis and machine learning techniques. Then the building components are
ranked according to their influence on energy usage. Prioritizing these factors, models are
simulated is the energy simulation software Design Builder and are compared to the ASHRAE
90.1 Standard models.
3.1 Workflow Diagram
The methodology was broken to three phases. First being Data Management, where the dataset is
prepared by organizing and altering according to the respective requirements of the modeling
approach and its software needs. Phase 2 employs this data to the various strategies being explored
and the output values are compared. Finally, in phase 3, the established parameters are prioritized
and altered and then used to simulate a new model whose predicted energy usage results will be
compared to that of the existing recommended ASHRAE model. Comparative analysis will be
done to conclude the discrepancies. The workflow breakdown of the three phases is shown in
Figure-5.
28
Figure 3.1: Methodology Diagram
In the following sections of this chapter, the methodology is explored in detail.
3.2 Data Analysis Strategies
This section talks about the different analysis techniques used to find and solidify the relationships
of the different building parameters with the overall building energy usage. Both regression and
random forests techniques are used to cross check the concluded results.
3.2.1 Stepwise and Forward Regression
The data analysis techniques practiced in this study are explained below.
29
Stepwise Regression
It is the This statistical method has the ability to use the large amount of predictor variables - which
are building component variables such as “Window-Wall Ratio”, in this study. Its job is to find the
best predictor variables from the available options. It is faster than other model selection methods
since it can run multiple models simultaneously and quickly in the IBM SPSS software.
There are two main types of regression techniques: forward and backward. Backward regression
is a model that includes all variables from the start and eliminates the extraneous variables one at
a time. Forward regression starts with an empty model and then adds variables one by one to select
or eliminate them according to their statistical significance. However, for the purpose of this study,
forward is preferred as its primary focus is to only report the highly significant variables.
The p-value indicates the amount of significance of the independent variable. The only p-values
that are accepted are the ones below 0.05 which indicate high significance. Lower the p-value,
higher the significance.
The other critical value to keep note of is the R square change value. Change in R-squared
represents the amount of unique variance that each variable explains above and beyond the other
variables in the model. R square change value such as 0.024 means that the predictor accounts for
2.4% of the variation in the model. The higher this value is, the higher its effect on the overall
model.
3.2.2 Decision Tree & Random Forest
The machine learning techniques that are practiced in this study are discussed in detail below.
30
Decision Tree
The flowchart-like structure of a decision tree starts with a root node leading to other successive
non-leaf nodes. Each node is a test performed by considering a specific condition on an input
variable, either binary or categorical, and the branches keep splitting until leaf-nodes are reached
to figure a possible value of the predicted output. There is then a path to follow from the root node
to the leaf-nodes through decision-making.
Figure 3.2: Decision Tree Concept [29]
Decision trees can classify and predict the categorical variables. The advantage it has over
Regression model is of its ability to generate accurate predictive models with flowchart like tree
structures that is used to extract useful information quickly. The decision tree segregates the data
which provides description, categorization, and generalization. In the model, the decision tree tries
to predict the target variable by using a set of predictor variables.
The decision tree has mainly thee components: root node, internal node and leaf node. The root
and internal nodes use a binary split test to make decisions, while the leaf node is the outcome of
classifications and so holds a categorical target label. The collected data are split into two sets:
31
training set and testing set. These two sets of data are disjoint. The training data is then used as
input for the decision tree algorithm outputs a decision tree. The J48 decision tree algorithm in the
software, Weka, is used.
Each pathway from the decision tree that starts from root to leaf nodes are important to understand
and keep track of. As those pathways carry the most important variables. The accuracy of the
outputted tree is tested by analyzing the predictions against the test data. If the accuracy is
acceptable then the tree can be applied to another dataset and for prediction. Otherwise the issue
should be investigated and corrected.
Besides, classification can be performed without complicated computations and the technique can
be used for both continuous and categorical variables. Furthermore, decision tree model results
provide clear information on the importance of significant factors for prediction or classification.
Singular decision trees are also resistant to outliers, thereby requiring less data processing.
However, the main flaw of the decision tree is that it is susceptible to noisy data. The noise creates
trouble for machine learning algorithms because if not trained properly, algorithms can think
of noise to be a pattern and can start generalizing from it, which is not wanted.
Random Forest
In random forest, multiple decision trees are grown. The more the number of trees the more robust
the forest. The random forest creates all these decision trees and gets the prediction from each of
the trees and then selects the best solution by means of voting. While creating random trees it split
into different nodes or subsets. Then it searches for the best outcome from the random subsets.
32
This results in the better model of the algorithm. Therefore, in a random forest, only the random
subset is taken into consideration.
Figure 3.3: Random Forest Concept [29]
The advantage this has over the decision tree is that it reduces over-fitting by averaging the result.
The decision tree is said to overfit to the training data if it generates a decision tree that is too
reliant on irrelevant features which is the issue with singular trees.
For the purpose of finding the most important variables using the random forest, the critical metrics
to consider is the reduction of prediction accuracy when the variables are segregated one at a time.
Another metric is the Gini impurity. These metrics are discussed more in Section 3.2.2. The
method to compute variable importance is to understand the mean decrease in impurity (also
known as Gini impurity) mechanism. At each split in a tree, the improvement in the split criterion
is the importance attributed to the splitting variable and is accumulated over all trees. This value
amounts to the Gini impurity and signifies its contribution of that variable to the overall model.
33
3.3 Data Analysis and Software
The experiment conducted by Dr. Okheon Lee provided the data to carry out this research. The
data includes detailed information on building components and energy usage of 200 commercial
buildings in Korea. As with any statistical analysis techniques, the data will need to be grouped.
In this case, the data was grouped in three sets: “Large Buildings” (Gross Area > 20,000), “Small
Buildings” (Gross Area<20,000) and “All Buildings”.
3.3.1 SPSS
The IBM SPSS software platform allows users to perform advanced statistical analysis. The
software readily includes a variety of machine learning algorithms, as well as having a tool to
tweak the output based on individual needs. Considering the use case for this research, only
stepwise and forward regression will be performed. The base raw data file with various parameters
of 200 buildings will be grouped and formatted as per the software requirements. Because the
SPSS tools only recognize qualitative values, categorical values like “Wall Material” or “Heating
System”, needs to be assigned unique number values for them to be recognized. Only when the
data is ready and complete, the regression tools can run the necessary analysis processes to output
the results. [3]
The analysis techniques focused on was forward regression and stepwise regression. The reason
for selecting these techniques is mentioned in the next section.
3.3.2 WEKA
It is a machine learning software the includes a collection of algorithms. Once the data is formatted
according to the requirements of this software, the algorithms can be utilized to produce the results.
34
The data is broken to two sets. The training set consisting of the majority of the data for the
algorithm to learn from to make predictions and the test set is used to validate the accuracy of
predictions.
J48 Decision Tree Algorithm
This machine learning algorithm in WEKA is used to generate a decision tree that is primarily
used for classification. The output should predict the best fit variables that affect Energy Use
Intensity. This method is effective to quickly read which sets of predictors impact energy usage
most. However, the individual parameter impacts. This method is effective to quickly read and
visualize the data.
Random Forest Algorithm
This algorithm can be used for a variety of tasks including regression and classification. This
procedure includes the analysis of a large number of decision trees. The predictions of all the trees
are combined to produce a more accurate prediction. This prediction is shown with prediction
percentage accuracy values. The purpose of performing this is to find the measures of importance
of each variable in the random forest. The first important factor to consider is of how much the
accuracy decreases when the carriable is excluded. And the second key value is the Gini impurity
value when a variable is chosen to split a node.
In a random forest, the importance of each variable can be understood. Each tree in the forest has
the target set of variables alongside the rest of the variables. Then the values of the target variables
are randomly shuffled while keeping the other set of variables the same. Then the decrease in the
prediction accuracy on the shuffled data is measured. The average decrease in prediction accuracy
35
across all trees is calculated. For example, the cooling system should show strong correlation, that
is higher decrease in prediction accuracy with the energy usage data for the summer months, while
the heating system data should show the opposite. To clarify, the importance is a measure of by
how much removing a variable decreases accuracy and vice versa.
Also, in the random forest, the “Gini impurity” is a calculation done at every split of a node. For
each variable, the total Gini impurity decrease across each tree of the forest is added every time a
variable is chosen to split the node. The total is then divided by the number of trees to calculate
the average. While the scale is irrelevant and only the relative values matter.
Design Builder
The Design Builder energy modelling software is used to perform the energy simulations. This
modeling software already includes the recommended building components of an ASHRAE
Standard 90.1 model. The software is also capable of integrating accurate weather data to produce
more accurate energy usage results. Since the data is broken to large and small commercial
buildings, two models are created with dimensions that represent the buildings of either category.
Similar models are created using the ASHRAE recommendations. Next, various iterations of the
pair models are created by changing the established important parameters. The energy results of
all these models are organized and tabulated for comparison analysis.
3.4 Validation & Results
On SPSS, the key metrics to consider are the P-Values which indicate the factor’s significance as
well as the change in R-squared value which indicate the amount of change that it brings to the
36
dependent variable (Energy Use Intensity, EUI). Using these values, independent variables
(building component parameters) are ranked according to their importance towards EUI.
On WEKA, the J48 algorithm option generates a decision tree. The decision trees are shown
visually as an expanding network where the “branches” split and eventually end at the “leaves” at
the bottom. Each split is a selection process which picks the more useful variable. Whilst the
Random Forest makes more accurate predictions and results.
Once the building components are ranked according to their impact on energy use intensity, the
simulations are to be performed to validate.
3.4.1 Simulation
After the critical building components are realized, the model simulations are to be produced to
validate the findings as well as to compare models with the current standard guideline models. To
produce these simulations, the energy modeling software Design Builder is used to generate the
building models (Figure 3.4). The two typologies of buildings, small and large commercial
buildings, is maintained throughout the models. Simulations of existing buildings are given
parameters based on the real data used and modeled. The ASHRAE model is then created using
the guide’s recommendations. Then, by changing only the critical building components, more
models are created.
37
Figure 3.4: A Rendered Small Commercial Building Model in Design Builder [28]
3.4.2 Results Comparison
All the models created in Design Builder generates energy usage values for annual energy use
intensity, as well as energy values for maximum heating and cooling days. The energy values of
all these models are tabulated and compared to validate the finding of the ranked building
components. If the critical components are successfully realized, a potentially better guideline can
be created that would be better suited for a specific geographic location and situation.
38
Figure 3.5: A Design Builder Energy Results and comparisons [28]
3.5 Chapter 3 Summary
Using both statistical and machine learning data analysis techniques, potential conclusions could
be drawn about the building components and their respective impacts on energy usage. After
crosschecking the statistical analysis methods with machine learning techniques, the critical
building energy components can be potentially ranked. Targeting these critical components,
several building energy models are simulated. The series of models considered are existing
building models, ASHRAE Standard 90.1 models, and improved models (based on ranked
components). The energy vales are then assessed to validate the effectiveness of this methodology.
This approach can be explored by future researchers who wish to pursue a guideline creation
strategy or even just a strategy to rank different factors of an interconnected system of components.
39
Chapter 4: Data Analysis and Building Component Relations
This chapter gives a brief introduction to the relationship within the building site energy use
intensity (EUI) and various façade features and climatic factors. It provides the preview of the
data and the analysis based on descriptive statistics and machine learning. The data used in this
research was acquired by Dr. Okheon Lee who gathered detailed information of 200 commercial
buildings at Seoul, Korea for the years 2014 and 2015. It includes information of Building
dimensions, occupancy information, façade components, HVAC equipment and energy usage
details. A section of the dataset is shown in Figure 4.1. First, however, the data needs to be
categorized and split in ways that is relevant to the research outcome and analysis techniques
which is discussed in the next segment.
Figure 4.1: The Data, collected by Dr. Okheon Lee
40
4.1 Data Categorization and Observation
First, a large collection of data is checked for missing information and organizational errors.
Next it is important to categorize building types according to their size and use. All the buildings
are commercial but vary greatly in dimension and therefore data was grouped in three sets:
“Large Buildings” (Gross Area > 20,000), “Small Buildings” (Gross Area<20,000) and “All
Buildings”.
The focus is on energy consumption, which is the dependent variable, so the energy use case
numbers considered are “Annual” usage, as well as “Summer(July)” and “Winter(January)”
usage. For the machine learning techniques used, the value of the energy consumption is grouped
as “Low”, “Medium” or “High” as those techniques only process nominal values and not
numeric. Each of the 18 sets of EUI values would act as dependent variables in 18 separate
models in any data analysis process.
4.2 Regression
This chapter details how regression analysis was performed on the dataset to identify the key
energy consumption components. As outlined in chapter 3, Forward regression is chosen because
of its method to only enter predictor variables that make a significant contribution to the model.
Those variables are entered one at a time and the process stops when no more variables account
for a significant amount for variance.
4.2.1 SPSS Software Setup Overview
As introduced in the Chapter-2, IBM SPSS is a powerful statistical software that is capable of
running every kind of regression analysis for a given dataset. While it makes it easy to run
regression analysis, the data will need to be set up for it to work properly. Firstly, the software
does not accept nominal data directly. For that type of data (For example, “Orientation”,
41
“Heating System”, etc.), unique values need to be assigned for the software to register it. For
example, the 5 options for “Heating System” are given the unique values as shown Figure 4.2.
This needed to be done with all nominal information. Only then do we get the complete set
(Figure 4.3) after which the tests can be run.
Figure 4.2: IBM SPSS Building Component Options Setup
Figure 4.3: Completed data input in SPSS software
42
4.2.2 Regression Output
Once the data is complete and set, the regression analysis can be performed. The dependent and
independent variables are set, and Forward Regression is initiated along with the shown settings
to get the necessary statistical information marked in pink in Figure 4.4. The Forward Regression
method only enters predictor variables that make a significant contribution to the model. These
variables are entered one at a time and this process stops when no more variables account for a
significant amount for variance. and this process stops when no more variables account for a
significant.
Figure 4.4: SPSS Regression Output Specifications
The test is repeated by selecting the other EUI (Energy Use Intensity) values as dependent
variables marked in red in Figure 4.5. The objective here is to use regression against each EUI
value to better understand by how much each of the components have the effect on overall
building energy use. The screenshots below show only the main relevant output for this
application, where the key factors are the “Variables Entered”, its “R Square Change” and its
“Sig F Change”. These factors are further discussed in Section 4.2.3.
43
For example, in the results produced when the dependent variable is set to “Large Buildings
Annual EUI” (Figure 4.5), the two key variables here that make significant contribution to the
overall model are “Floor Height” and “Occupancy Density” which respectively account for
12.5% and 9.7% of variation in the model.
Figure 4.5: Regression Output. Dependent Variable: Large Building Annual EUI
Next the dependent variable is changed to “Small Buildings Summer EUI” (Figure 4.6). The two
key variables here that make significant contribution to the model are “Cooling System” and
“Occupancy Density” which respectively account for 14.6% and 7.1% of variation in the model.
This is further explained in 4.2.3.
44
Figure 4.6: Regression Output. Dependent Variable: Small Building Summer EUI
In the next example simulation, the dependent variable is “all Buildings Summer EUI” (Figure
4.7). The five key variables here that make significant contribution to the model are “Stories”,
“Cooling System”, “Floor Height”, “Occupancy Density” and “Window Wall Ratio” which
respectively accounts for 5.3%, 6.0%, 3.4%, 2.4% and 2.2% variation in the model.
Figure 4.7: Regression Output. Dependent Variable: All Buildings Summer EUI
45
4.2.3 Regression Results Key Factors
The key players of the stepwise regression results analysis are explained below.
Variables Entered
The first table of each output labelled “Variables Entered/Removed” identifies the variables one
at a time that have statistical significance (Sig F. Change) as well as a significant R Square
change. The variables that are not shown have either or both of those criteria missing and hence
are omitted. These variables contribute most to the overall model and are shown in descending
order. In the table in Figure 4.5, the green box shows that “Floor Height” contributes to the most
change in the model followed by “Occupant Density” which carries the second-most
contribution. All other variables removed due to not meeting the criteria.
P-Value (Sig. F Change)
This value represents if the variable is significant to the model or not. Any P-value of over 0.05
is labelled insignificant and is disqualified from the model. In Figure 4.5 the two chosen
variables, “Floor Height” and “Occupant Density” have P-values of 0.000 and 0.001 respectively
(labelled by the blue box) and hence are significant.
R Square Change
The Change in R-squared represents the amount of unique variance that each variable explains
above and beyond the other variables in the model. In Figure 4.5, in Model-1, the R Square
Change value (Labelled by the red box) of 0.125 represents that the variable “Floor Height”
accounts for 12.5% of the variation of the model of the and 0.097 in Model-2 that adds the
variable “Occupant Density” accounts for a further 9.7% of variation in the model. This factor
plays a critical role in the understanding of the extent of effect of these significant variables.
46
All this resultant information is tabulated for analysis and to realize the significance and effect of
each component (variable). This is done in Section 4.4.
4.3 Machine Learning
This chapter details how some machine learning algorithms were implemented to further
examine the data. The Weka software is chosen because of its simplicity in use and quick outputs
using the J48 Decision Tree algorithm.
4.3.1 Weka Software Setup Overview
The first step is to convert the excel file to an “Arff” file format that Weka can read. Thankfully,
Weka already includes this converter in the software. To use this converter, the excel file needs
to be changed to “csv.dos”. (Figure 4.8)
Figure 4.8: Excel File Conversion for Weka
47
Then to convert to the ARFF file, in Weka “Experimenter” option can be used. Then in the
“Analyze” tab, and then the CSV.DOS file can be opened and then saved as an ARFF file.
(Figure 4.9)
Figure 4.9: Weka Excel data input complete
Now, the file can be opened by Weka and different algorithms can be applied for analysis. Once
opened (Figure 4.10), in the “Classify” tab, the “Choose” button allows the selection of
algorithms where J48 Decision Tree is selected. Cross validation is selected with 10 folds as it is
the most common practice. Next, the Dependent variable is to be selected which in this example
is the “2015 ANNUAL EUI GROUP”. After running this, the results show in the Classifier
output section. The output in the result list can be right clicked and “visualize tree” is to be
clicked to show the output tree. (Figure 4.11)
48
Figure 4.10: Weka Output. Decision Tree J48. Dependent Variable: 2015 Annual EUI Group
Figure 4.11: Weka Output. Visualized Decision Tree J48. Dependent Variable: 2015 Annual EUI Group
The different dependent variables are chosen one at a time and the model is run repeatedly to
output the needed results.
49
4.3.2 J48 Decision Tree Results
The J48 algorithm is the top-down induction of decision trees. It picks attributes with maximum
probability. The tree begins with a root node which is the most significant contribution node then
follows down to the leaf node which is in this case, “Low”, “Medium” or “High”. Within the
brackets at the leaf node, the number to the left is the number that obeyed to the classification
and the number on the right is the number of exceptions. The ones with single numbers indicate
the correct number of times that the model outputted that prediction. Each pathway leads to an
outcome of energy usage. Therefore the 9 different dependent variables outputs 9 different trees,
each with its own tree pathways. Two of these trees are shown below and the rest are shown in
the Appendix.
Figure 4.12: Weka Output. Visualized Decision Tree J48. Dependent Variable: 2015 All Buildings Annual EUI Group
In pursuit of finding components that contribute to low energy usage, the highlighted pathways,
which represent high probability to outcome “Low” EUI, are of the most interest. The dependent
variable here is “2015 All Buildings Annual EUI”. This group contains data of both large and
50
small buildings and is focused on annual energy usage. In this tree the five “Low” EUI output
pathways are listed below:
• Cooling (Chiller) > Heating (Boiler) > Footprint (Other) > Window Type (Tilt Turn) >
Low (3.0)
• Cooling (Chiller) > Heating (District) > Roof Material (Concrete) > Ceiling (Low) > Low
(6.0/1.0)
• Cooling (Individual) > Low (40.0/14.0)
• Cooling (HP) > Orientation (E) > Rise_N (Low) > Low (6.0)
• Cooling (HP) > Orientation (S) > Shading (Yes) > Low (3.0)
The cooling system has the most significance as it is at the root node here. Between the first two
pathways, the second one, with 6 correct instances and 1 incorrect instance, is a better option
than the first one with 3 correct instances.
Overall, these are the statistically significant recommended options to reach a low energy use
scenario based on the data that the algorithm was given. The pathways are analyzed later to find
the best set of options for a small or large commercial building in Seoul, Korea.
51
Figure 4.13: Weka Output. Visualized Decision Tree J48. Dependent Variable: 2015 Small Buildings Winter EUI Group
The dependent variable for this tree is the EUI for Winter (January). The three recommended
“Low” EUI output pathways are listed below:
• Cooling (ACH) > Shading (No) > Low (27.0/12.0)
• Cooling (HP) > Low (26.0/6.0)
• Cooling (Individual) > Heating (Individual) > Low (14.0/3.0)
In this tree, no shading is recommended, and it is an accurate prediction because January has the
maximum cooling days and therefore shading is not beneficial to lower EUI. Shading blocks
sunlight which would help with decreasing the heating load.
Following this trend the remaining trees are modeled and the results are gathered and tabulated
in the next section to get a better understanding and finding key indications of better or worse
building components in accordance to energy usage and further rank the parameters and it’s
individual options.
52
4.4 Parameter Ranking
The results from the two methodologies are tabulated below to understand the differences and
establish separate building component rankings.
4.4.1 Regression Results
Using SPSS and Stepwise Regression the 18 models are analyzed. The significant variables and
their corresponding “P-Value” and “R Square Change” values are noted in the table. The table is
color coded to indicate the Annual models in green, Summer models in yellow, and Winter models
in blue. The table is divided to “All Buildings”, “Large” and “Small” buildings which are separated
by the orange lines.
Figure 4.14: Regression Output Summary
Some models contain more significant variables than others. For example, the “2015 Winter
Large” model (indicated by the light blue box) presented 6 significant variables while the “2015
53
Annual Small” (indicated by the light green box) presents just 1 significant variable. However, the
“2014 Annual Small” regression result showed no significant variable and the reason for this will
be further discussed in Section 4.4.4. The frequency of the presence of the variables in the models
is one of the key indicators of component importance to the overall building system. The average
R Square change for a specific variable is the other key indicator. Both these metrics are used to
establish if a component is more critical to the model compared to the other components.
The Table below Figure 4.15indicates the frequency of the occurrences Here, “Occupancy
Density” was by far the most recurring variable with a high average R Square change. Followed
by “Floor Height” at 8 occurrences, then “Heating System” at 6 and so on. The components are
ranked in descending order as shown in the right figure, based on frequency and R-Square change.
Figure 4.15: Regression Ranking
4.4.2 Decision Tree Ranking
Following a similar table structure to the Regression Analysis in section 4.4.1, the table for the
tree results was created as shown. Where the output results are optimal component guidelines to
54
achieve “Low” EUI, as discussed in 4.3.2. For the purpose of viewing, the table is broken into
two tables below. (Figure 4.16)
Figure 4.16: Decision Tree Output Summary Part 1
Figure 4.17: Decision Tree Output Summary Part 2
55
Out of the 18 trees to be produced, three models produced no tree nodes. This occurrence will be
further discussed in Section 4.4.3. However, the other models produced healthy trees of which
only the pathways that lead to “Low” with high probability were noted in the table. For example,
in the model “2015 All Buildings Summer”, there are 6 recommendations. The first of the six is
Cooling (Individual) (40.0/3.0), meaning that the Individual cooling system is recommended and
it’s prediction accuracy is 93% (From 40 correct and 3 wrong instances).
The data is analyzed in accordance to each building component and is tabulated in (Figure 4.18).
Within each component are the options it contains. The frequency of the presence of each
component is noted as well as the total frequency (in red) for the overall component frequency.
To obtain the prediction accuracy the total correct and wrong instances are noted similarly. Both
the prediction accuracy and frequency are key metrics on understanding the importance of each
overall component as well as the corresponding options within. The “Option Ranking” and the
“Classification Ranking” columns indicate the rankings. For example, the Cooling System, is
ranked 1 overall as it had the highest frequency in the trees and it entailed high prediction
accuracy. For the options, similarly, the “Chiller” was ranked highest, then “Individual”, then
“ACH” and so on.
56
Figure 4.18: Decision Tree Output Summary Analysis
After analyzing the data, the ranking was established below (Figure 4.19) in descending order
similar to the regression ranking for a further comparison in the next chapter. This ranking also
includes the best option within each component in descending order.
57
Figure 4.19: Decision Tree Ranking
4.4.3 Ranking according to an expert
It is important to note the architect’s perception in regard to the component ranking for further
comparison. USC’s Professor Marc Schiller provided his ranking based on his experience and
intuition (Figure 4.20). This will serve as the “Builder Designer’s Ranking”. Even though this
ranking is not backed by specific scientific data, the comparisons with the established data-
driven rankings is still of interest. It gives an insight on how building designers prioritize certain
factors over others and how it compares to data-driven priorities. This is further explored in
Chapter-5.
58
Figure 4.20: Building Designer Ranking
4.4.4 Limitations
As with most large data driven analysis processes, there are key imperfections in the results
presented above in 4.4.1 and 4.4.2. The root of the problem is the data which is understandably
not perfect and so the results produced from it are not perfect. For example, the data includes both
new and old buildings. The wall material for majority of the buildings is block. The vast majority
of the buildings considered in the data also had no shading, which is understandable as the location
is in a colder climate but is not helpful for the accuracy data. The data overall lacked an even
distribution of varying components to make a perfect analysis.
For example, the “2015 Winter Large” model indicated by the bright blue box in (FIGURE),
contain 6 useful variables whilst the “2014 Annual Small” showed no variables that were
significant enough. The latter model lacked consistency in building components in accordance to
energy usage and therefore those variables were eliminated in the process. In the decision tree
59
analysis, some models, indicated by the green boxes in Figure 4.16 and 4.17 produced no trees for
the same reason.
A possible solution would be to add more building data of different characteristics to the existing
data to make mode complete and accurate models for analysis.
4.5 Chapter Summary
The arc of this chapter is to show the methodology on how to assess building data by two
techniques. This includes data cleaning, compartmentalization, software processes, and data
analysis.
The characteristics of the 18 models were outlined that were compared and analyzed. The
methodology of using IBM SPSS software to perform stepwise regression is explained. The
methodology of using Weka for producing J48 decision trees and results is explained. The results
from both methodologies were tabulated in compact forms with only key metrics for analysis.
From those tables, two ranking systems emerged that ordered the building components according
to their energy consumption criticality. A third ranking was also outlined to represent a building
designer’s perspective for comparison.
The goal of this chapter is to establish the building component rankings and set the stage for the
next chapter where the rankings will be scrutinized via energy modeling.
60
Chapter 5: Simulations and Validation
The conclusions of building component criticality from chapter 4 are assessed in this chapter.
Software simulation of models is an effective method on predicting energy usage. Modern
simulation software like Design Builder make it easy to simulate models. With built-in building
components and standards in the software, individual components like the HVAC system or
façade characteristics can be altered and the simulations can be run to predict energy usage
within a specified timeline. Location and climate data are also readily available and can be
implemented in the simulations. Based on the existing building data and ranking systems
established in the previous chapter, several models are run and its results are compared in the
following sections.
5.1 Design Builder Models Criteria
Based on the existing dataset, the commercial building is modeled in Design Builder. The
building dimensions are based on the average gross floor area of the buildings from the “Small
Buildings” dataset, which amounts to 8270 square meters. Also, from the dataset, the average
“Stories” is calculated to be 9 floors. Hence each floor amounts to 919 square meters. Since
“Orientation” is one of the factors compared, the building is given a rectangular shape with a 1:3
ratio which makes it 17.5 meters by 52.5 meters. With the dimensions established, the building is
modeled as shown in Figure 5.1. The other components of the model are discussed in the next
section.
61
Figure 5.1: Initial Design Builder model
5.1.1 Baseline and Improved Model Overview
It is necessary to establish the key baseline building upon which improvements can be made for
made. As mentioned, the building dimensions were set according to the dataset that this
experiment was based on to maintain consistency. The building components and materials
chosen is based on ASHRAE Standard recommendations. The improved components were
chosen based on the specifications (which can be viewed in the modeling software) which show
how well they perform. The specific improved components were chosen based on a 20-30%
improvement from the baseline ASHRAE recommendation. To summarize the component
selections, the Table 5.1 is illustrated below.
62
Table 5.1: Design Builder Model Specifications for Ranking System
HVAC causes a large change in energy usage, so it is improved by only 21% which is still a
drastic change as will be seen later. Shading, in this study, has only two options being “Yes” and
“No” and therefore holds no value for Improved %. Likewise, orientation is also nominal, having
only two main options for this study.
5.1.2 The Simulation Components
The components can be changed easily in Design Builder. Before the creation of the model, the
settings were changed for the default model to follow the ASHRAE 90.1 Recommended values
and components. This is built in the software. Assessed below, are the critical components that
are considered and changed. An improvement of 20-30% for every component is considered to
ensure an even playing field for analysis. The U-Value is the unit of thermal transmittance and is
used to indicate the insular properties of that material. A lower U-Value means higher insular
performance, hence less heat loss. Materials in components such as Wall Material or Glass Type
were selected by conducting a comparison of the U-Values. The component selection is
illustrated next, in detail.
63
Occupancy Density
The recommended occupancy is 22 square meters per person and a 30% improvement amounts
to 29 square meters per person. Figure 5.2. This results in a substantial decrease in the utility of
the building.
Figure 5.2: Specifications of Occupant Density Improvement (Right is better)
Wall Material
The recommended option by ASHRAE for wall material consist of concrete block and outer
brickwork with a combined U-Value of 0.350. The improved option selected consists more
insulation that brings the U-Value down to 0.251, which is a 29% improvement. Figure 5.3.
Figure 5.3: Specifications of Wall Material (Right is better)
64
Roof Material
The ASHRAE recommended roof consists of asphalt, insulation, and plasterboard with an
overall U-Value of 0.250. The improved roof consists of metal cladding and insulation with an
overall U-Value of 0.179, which is a 28% improvement. (Figure 5.4)
Figure 5.4: Specifications of Roof Material (Right is better)
Window Wall Ratio
The common practice for ASHRAE commercial buildings window to wall Ratio is 40%. The
improved component is chosen to be 30% which is a 25% increase in the overall component.
This is described in the Figure 5.6.
Figure 5.5: Design Builder options for Window Wall Ratio, WWR (Right has lower WWR)
65
Figure 5.6: Model Window Wall Ratio (WWR) comparison. (Right has lower WWR)
Window Glass Type
The recommended standard glazing type is a 2 layered glass panel with an overall U-Value of
2.47. The improved double glass panel has a U-Value of 1.76 which is a 29% improvement.
(Figure 5.7)
Figure 5.7: Specifications of Window Glass Type comparison (Right is better)
66
Window Frame Material
Recommended window frame material is aluminum with a U-Value of 5.01. The chosen
improved material is wood with a U-Value of 3.63 which is a 28% improvement. (Figure 5.8)
Figure 5.8: Specifications of window frame material comparison (Right is better)
Shading
Shading is turned on (or off) by the toggle shown in Figure with an overhang of 1 meter.
According to the dataset, these are the only options, so this study reflects that. (Figure 5.9)
67
Figure 5.9: Model Shading Off and On comparison. (Right is On)
HVAC System
One of the recommended HVAC systems is the VAV Air Cooled Chiller, and on is selected as
shown in Figure 5.10 where the annual HVAC cost per cooling and heating area is 358.35
USD/m2 GIFA (US Dollars per Square Meters of Gross Internal Floor Area) . The Improved
selection is the Air Cooled Chiller Fan Coil Unit which has a cooling and heating cost of 280.45
USD/m2, which lowers energy cost by 21%. The details are shown in Figure 5.10.
68
Figure 5.10: Specifications of HVAC system (Right is more efficient)
Orientation
Having a long rectangular footprint, the better option is for the larger side to face the North-
South as opposed to East-West, in order to maximize the use of daylight and solar gain to reduce
heating load. The left part of Figure 5.11 shows worse orientation while the right one shows the
better choice.
69
Figure 5.11: Rendering in Design Builder showing Orientation differences and sun path. (Right is better as the largest face faces
the North-South direction)
Floor Height
The recommended commercial building floor height was set at 4.0 meters. This was lowered to
3.5m which is a 20% decrease. A 30% decrease would be too low to meet the standard. (Figure
5.12)
Figure 5.12: Model Floor Height comparison.
70
Stories
The first model has 9 floors and a 30% reduction in the number of stories would result in 6
floors, but since the gross area will have to remain 8230 square meters, the building will need to
have a larger floor area. (Figure 13)
Figure 5.13: Model Stories comparison (Right has less floors but has larger footprint)
5.2 Simulation Results
The simulations are run to produce annual EUI results Figure 5.14. This value not only considers
the building occupancy, occupancy usage and utility consumption, but also solar heat gain and
climate conditions throughout the year.
Figure 5.14: Design Builder Simulation Results for Initial model (top) and improved model (Bottom)
71
The components are changed sequentially according to the respective ranking system and
simulated at every change. Annual energy results are noted at each simulation and the delta is
calculated percentage improvement.
5.2.1 Simulations for Ranking Systems
From Chapter 4, three building component ranking orders were gathered for comparative
analysis. The comparison is performed with annual EUI usage simulation values. The following
Tables 5.1, 5.3 and 5.4 were created to show the simulation values. The building components are
improved sequentially, according to the specific ranking, and the value is noted. The energy
saved (Delta) with every added improvement is noted and the percentage improvement is
calculated. For example, at Rank 7 of the Building Designer ranking, the model includes all the
previous improvements before it. This is performed to note the rate of improvement which will
be farther compared in a later section.
The baseline model created, which follow ASHRAE recommendations, produced an annual EUI
usage of 248.53kWh/m2. This was the starting model for all three groups. For each group, with
every ranking step and subsequent improvement, simulations were performed and the Energy
(kWh/m2) is noted. The delta is used to calculate the individual rank Improvement. The
Cumulative Improvement Percentage is also noted.
72
Table 5.2: Building Designer Ranking Simulation Output
Table 5.3: Regression Ranking Simulation Output
Table 5.4: Decision Tree Ranking Simulation Output
73
By the end of the groups, the cumulative improvement is seen to be almost the same and this is
because by that point, the three models in comparison are almost identical. The decision tree
table has one less factor because the ranking itself removed “Window Frame Type” at the
ranking selection process, noted in Chapter 4.
5.2.2 Simulation Results Analysis
Charts are created from the percentage improvement tables as shown in Figures 5.15 and 5.16.
The first three charts were created to indicate the individual rank improvement. The second
group indicates the cumulative improvement charts for further clarification. Theoretically, the
perfect ranking system will show a gradual decline in individual percentage improvement and
incline in the cumulative improvement charts.
The HVAC system had by far the greatest improvement, being about 37%, and should be at the
top in all rankings but is not the case here.
The Building Designer Ranking was correct about the HVAC system but for the following
rankings, the simulations show fluctuating improvements. For example, the window glass type,
floor height and orientation should be ranked higher.
The Regression Ranking was incorrect about the HVAC system and also shows fluctuating
improvements throughout the chart. Although it was correct about the floor height, it is worse
ranking system then the Building Designer.
The Decision Tree ranking, like Building Designer Ranking, inaccurately ranked floor height.
However, it showed the most consistent decline comparatively.
74
Figure 5.15: Individual Ranking Component Improvement Charts (Left to Right: Building Designer, Regression, Decision Tree)
Figure 5.16: Ranking Component Cumulative Improvement Charts (Left to Right: Building Designer, Regression, Decision Tree)
When considering the three methods of ranking, the decision tree guideline proves to be the best
option when finding the most important building components. The ranking produced from the
statistical approach, stepwise regression, proved to be inaccurate and inconsistent in this test.
5.3 Individual Components Improvement Based on Decision Tree
From Chapter-4, along with the component ranking, the options within the components were also
ranked in accordance to low energy usage prediction. The options are shown again in Table 5.5
below, where the options are noted from most to least effective. The best and worst options were
noted. To validate this finding, simulations were conducted.
75
Table 5.5: Decision Tree Component and Options Ranking
This table also includes an additional factor, “Building Footprint” which was not considered in
the overall ranking analysis because it was deemed insignificant in the regression ranking and
was eliminated.
5.3.1 Model Simulations Criteria
For this group of simulations, the concept is to start with the model that has the component
options that were least recommended by the decision tree and then improving it by individually
selecting the most recommended option.
Like the simulations covered in Section 5.1, the model building carries a gross floor area of 8270
square meters. However, according to the decision tree, the least preferred building footprint
shape is the L-Shape. Therefore, such a model was created which still had a 1:3 ratio, 9 floors
with the same single floor area of 919 square meters. Rectangular was the best recommended
option. (Figure 5.17)
76
Figure 5.17: Model Building Footprint Comparison: L-Shaped (Left) Rectangular (Right)
For high and low stories comparison, the improved model had lower stories. Therefore a 30%
decrease from 9 is 6 floors. Doing so consequently increases the individual floor area. (Figure
5.18)
Figure 5.18: Model Stories comparison (Right has less floors but has larger footprint)
The least preferred option for the HVAC system is District Heating and Cooling, while the best
recommended choice is Chiller. The options are selected as shown (Figure 5.19). Note that they
have the same HVAC cost per unit heating and cooling load, but its setup produces different
energy usage results during simulation.
77
Figure 5.19: Specifications of HVAC system (Right is more efficient)
The simulations are run to produce annual EUI results. The components are updated one at a
time and made to run. The top half of the Figure 5.20 shows the total energy usage of 202.06
kWh/m2 for the initial control (worst) building model that used a “District” HVAC system. The
second half show results of the model with only an updated HVAC system of the “Chiller” type
which lowered the energy usage to 184.20 kWh/m2.
Figure 5.20: Design Builder Simulation Results for Initial model (top) and improved HVAC model (Bottom)
78
Likewise, the other components were improved, simulated and the results tabulated in the next
section.
5.3.2 Simulation Results and Analysis
To validate the finding of the decision tree’s best and worst options within each component is the
primary goal of this section. In the Table 5.6 created below, the EUI simulation outputs in the
Energy (kWh/m2) show the energy usage after selecting the better recommended option. The
initial model, which consisted of the least ranked options, produced an annual energy usage
result of 202.06 kWh/m2. The Delta column is populated by subtracting from that initial value
and the improvement percentage was calculated. It is seen that each recommended option
selected created a positive energy improvement.
Table 5.6: Decision Tree Individual Options Simulation Output
The Figure 5.21 was created from the table below to visualize the impact each of the better
option has on the overall energy EUI. It is interesting to note that the HVAC system does not
show as great of an impact on EUI as the previous finding in the ranking systems, as this change
in HVAC theoretically uses the same amount of energy with a different strategy which is unlike
79
the change done in the ranking system where the whole system was changed by the criteria of
only energy usage.
Figure 5.21: Individual Component Improvement (as per Decision Tree recommendations)
It is concluded that each component, when changed from worse to best recommendation, defined
by the decision tree, produced positive energy savings. This is a key indication that the options
ranking conclusions from the decision tree was accurate.
5.4 Research Outcome and Limitations
In chapter 4, the rankings of building components were established using two strategies.
Statistical regression and machine learning decision trees with little similarity regarding results.
The multiple simulations performed in this chapter validate some arguments and invalidate
others made from those conclusions.
The regression ranking was seen to be not accurate in its ranking prediction as the simulation
results proved. Although a useful statistical method, from the charts presented the results derived
showed much inconsistency and is therefore a poor methodology to pursue building component
relations. The results stem from the way the regression results are carried out. One of the main
80
reasons that the regression methodology failed is because it is highly depended on the existing
data linearly. Regression may be able to tell if a building component may be a significant factor
to the model but the effect of it on the overall model is seen to be imprecise.
However, unlike regression, the decision tree computes its findings by analyzing multiple
generated models to find the best fit to predict low energy usage. And this methodology proved
to be far more effective and accurate as the charts indicated. A much more consistent
improvement in regard to ranking, show the method’s potential. Also, the component options
comparison in Section 5.3 validate that the results presented are accurate. Overall, this
methodology has shown to have the aptitude to compare and rank building components
according to their criticality on energy usage.
One of the key limitations of using a data driven approach is the lack of resolution in data. Some
components like “Shading” lack more options as this component can be better architecturally
designed to perform significantly better. Orientation is also such a factor that lacked specificity.
This reduces the granularity of the research output significantly. In addition, even though the
dataset used for this study was plentiful, it lacked the variations in building components were not
evenly distributed and this contributes to inaccuracies. More stratified data will result in a more
accurate ranking system.
Another limitation in this study the component changes studied via simulations was limited by
the software itself. Because of the lack of options and inability to easily add new or change
existing components in Design Builder. Because of this, the 20-30% improvements were done to
validate the ranking system. A more robust energy simulation software or methodology whilst
also using a more detailed dataset will result in a more detailed and accurate ranking system and
simulations.
81
5.5 Chapter 5 Summary
The purpose of this chapter is to validate the findings of chapter 4 through the analysis of
building energy simulations. The findings in question are three ranking systems. A statistical
ranking system that used stepwise regression methodology, a machine learning approach that
used decision trees for a second ranking system, and finally the experience and intuition-based
building designer’s ranking system.
The simulation model was created based on existing building data that shaped the ranking
systems to begin with. The specifications and components are noted and altered in the model
separately as needed. Several simulations are performed to output annual energy use values.
These values are farther tabulated and charted for comparative analysis.
Upon examination, it was seen that the regression ranking was the worst of the three as it was
generally inaccurate in its ordering of components. The designer’s ranking was correct in some
instances but in others it was not. The decision tree ranking, although not perfect, proved to be
the most accurate of the three.
82
Chapter 6: Conclusion and Future Work
6.1 Research Summary
Data-driven techniques can be used to identify relationships between factors that are otherwise
difficult to find. Although it has been successfully implemented in many facets of life, it is still
in the early stages when it comes to building science. Now that more and more building data is
becoming accessible, the potential to effectively use data-driven techniques have become more
viable. One of the most referred to building energy standards is ASHRAE, which is not only
updated once every few years, but it is also less specific since it must be globally
accommodating. The premise then was to find method where data-driven science could be
implemented to help in the creation a more tailored guideline which could not only be updated
quickly but more effectively according to geographic location.
Background research was done to understand the current knowledge of building components and
energy usage and where data-driven implementation in building energy stands. It has been
established by researchers that the building envelope components play a vital role on building
energy consumption. However, it is not as critically considered in building design. Some of the
most popular energy standards are also explored and compared to see identify the differences and
issues. It was seen that not only was there a lacking in synchronicity but also that these
guidelines are not updated as frequently as it needs to. The guidelines are also quite generalized
which makes them less effective in varying geographic regions. The current applications of data
driven energy prediction were also explored. It was seen that researchers used sophisticated
machine learning algorithms to generate buildings to compensate for the lack in building data at
the time. It was seen that those approaches lacked precision in the overall research outcomes.
83
Therefore, the goal for this study was to establish a methodology where a data driven approach
can be implemented, in a simple way, to understand building component relations with overall
energy usage. There are mainly two tracks of data-driven science: statistical analysis and
machine learning. For this study, the most effective approach to statistical analysis was found to
be Forward Regression which was used to identify and key components that create an impact on
energy usage. A ranking of the building components was established in relation to energy
consumption. For machine learning, the decision tree (J48 algorithm) was used due to its
flexibility, accuracy, and ease of use. The algorithm would produce visual decision trees which
are used to identify the critical building components. A second ranking order was established
using decision trees. A third ranking order, from the perspective of a building designer, was also
acquired for comparison. All the rankings were seen to be different. The three rankings, arranged
in descending order, are illustrated in Figure 6.1. Note, that the factors chosen to align with the
information given in the dataset which include the limitations that come with it. To validate if the
rankings are effective and which of the three was the most accurate, energy simulations were
performed.
84
Figure 6.1: The building component ranking systems (Left to Right: Building Designer, Regression, Decision Tree)
Out of the many energy modeling software, Design Builder was selected due to its simplicity to
implement weather data and ease of modifying building components. A commercial building
was modelled based on the dataset used. ASHRAE recommended components are used for the
initial control model. Upon this model, improvements were made according to the ranking
systems and simulated to get the annual building energy usage values. The simulation outputs are
tabulated and analyzed to validate the three ranking systems.
85
Figure 5.15: Ranking Component Improvement Charts (Left to Right: Building Designer, Regression, Decision Tree)
The three rankings’ improvements were charted (Figure 5.15) to visualize the incremental
improvements. Amongst other findings, it was concluded that the decision tree was the most
accurate of the three ranking systems.
Furthermore, the decision tree was also able to rank the most and least recommended options that
was associated with low energy use. Through simulations, it was verified that the recommended
better component options produced more energy savings than the least recommended options by
the decision tree. The amount of percentage of positive energy savings are outlines in Figure
5.21. This was another verification step to realize the potentiality of the decision tree
methodology.
Figure 5.21: Individual Component Improvement (as per Decision Tree recommendations)
The goal was to find a simple data driven methodology to prioritize critical building components.
While the statistical methodology was ineffective, machine learning through decision trees
86
proves to be a viable option when finding the most effective components to target when
designing a building in a specific region. It is simple to implement whilst producing useful and
accurate findings.
6.2 Evaluation of the methodology and improvement to current workflow
The methodology presented in this study can be simplified slightly, as it involved into practicing
different approaches and comparing them to find the best one and them imlpementing it. (Figure
3.1)
Figure 3.1: Methodology Diagram
The layout, for the most part, can remain the same with the three phases: Data Management,
Data Analysis and Simulation & Validation. In the Strategies column, Regression can be
removed as it proved to be unfit for this task. Other data analysis strategies can be implemented
here for comparison. Once the critical parameters are recognized in phase 2, phase 3 can proceed
with validation of that recognition via simulations.
87
6.3 Limitations
Data science is highly reliant on the quality of the data. If it is complete, detailed or plentiful,
create large impacts on the reach output. In this study, though the given dataset was relatively
large in quantity and detailed, it was not detailed enough to conduct a more granular
investigation.
Shading, for example, can be a very effective tool in heat management and its design greatly
effects its performance. However, for this dataset, it was either if the building had a shading
component or did not. Orientation was also broken to four options where the large face of a
building data would face E, S, etc. However, the buildings footprint dimensions were not
specified and was assumed that they were rectangular with 1:3 length to width ratio where the
larger length would face that orientation. Such grouping and assumptions limit the specificity of
the outcome.
Another limiting factor was the lack in variation in the dataset. Some components like Wall
Material for most buildings was seen to be “Concrete Blocks”. The lack in variation of data was
also a contributing factor in the overall accuracy of the findings.
Though the overall research outcome produced results to identify how these factors compare, it
is a rough comparison because of these factors that lacked specificity and the data lacking
variety. The research outcome brought about a shortcoming in precision to formulate a new
guideline.
6.4 Research Applicability
Different regions of the planet have different weather patterns and the culture of construction and
component selection process also varies significantly. The data used in this study was of existing
88
buildings in Seoul, Korea. The rankings calculated were specific to the nature of that data. This
ranking can be vastly different if a much warmer region, like Bangladesh, was chosen. Because
current energy standards like ASHRAE are used globally, it lacks specificity. Using this
methodology can indicate key components for the set region to improve upon. This can lead to a
more efficient and tailored energy guideline for the set region. This would not only improve
sustainability but can also be economical as well, as designers or owners can appropriate capital
according to component priority.
6.5 Future Work
There is enormous scope when it comes to the implementation of data science in building
science.
6.5.1 Using Other Machine Learning Algorithms
The rapid advances in data science allows easy implementation of complex algorithms. The J48
Algorithm used in this study was accurate and fruitful but not perfect. A better algorithm may
exist. Random Forest is another methodology that can be potentially used in this regard.
6.5.2 Implement Other Variables and Units
As noted in Section 6.3, a higher quality dataset will create a higher quality outcome. Data of
greater resolution (an example would be more options in the data for the “Shading” component
in this experiment) will enable the use of more specific variables to be implemented which
would in turn result in more useful and granular relationships. Hence acquiring a higher
resolution dataset is key to addressing this issue.
Other variables like economy can be potentially implemented in the current workflow. Trying to
find relationships of investment of a building component with actual energy usage can be
89
investigated. Likewise, the factor of time in construction practices regarding various component
options can be explored to improve construction efficiency.
Thermal Energy Demand Intensity (TEDI) is another unit that can be used instead of Energy Use
Intensity (EUI). This unit represents annual heating energy demand per unit floor area and it’s
unit is kWh/m2a. It is currently only used in the state of British Columbia in Canada to improve
performance of building envelopes.
6.5.3 Automation and Speed Improvements
The current workflow requires manual input of data into software such as IBM SPSS and Weka.
Also, the multiple simulations conducted in Design Builder were done one at a time which can
be time consuming. This methodology for decision tree, although effective, can be sped up.
Although software limitations exist, perhaps the use of other software or plug-ins can bypass this
issue.
6.5 Summary
The field of Building Science adopts a holistic approach to sustainability. Though the integration
of multiple disciplines has proven beneficial, some domains remain underutilized. Amongst
these, data science has high potential for contribution due to the rapid surge in the availability of
data. Through the research conducted within this thesis, it has become evident that the inclusion
of data science in the field of building science is a practical, effective approach.
Current studies in the realm of data science in buildings suggested a need for a simple data-
driven methodology to better understand the components in a building that work together in
synergy. Energy standards, though useful, are generalized to meet the needs at a global scale.
Therefore, a methodology to create a more specific guideline can be of great benefit.
90
To conclude, this study was successful in comparing different data analysis techniques to
identify the criticality of building components regarding energy saving whilst considering
geographic location. Regression analysis proved to be inadequate while the decision tree
technique proved to have potential to find accurate findings. However, establishing a more
specific building guideline recommendation would require overcoming key limitations like data
resolution and data variety. The knowledge gained from this study should be perceived as a
steppingstone towards the implementation of data science techniques in building science. The
scope is great here and should not be overlooked when it comes to furthering building expertise.
91
REFERENCES
1. Sadineni, S., Madala, S., & Boehm, R. (2011, July 30). Passive building energy savings: A
review of building envelope components. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S1364032111002504
2. Chan, K., & Chow, W. (1999, March 04). Energy impact of commercial-building
envelopes in the sub-tropical climate. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S030626199800021X
3. Ciampi, M., Leccese, F., & Tuoni, G. (2003, October 17). Ventilated facades energy
performance in summer cooling of buildings. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0038092X03003396
4. Singh, M., & Garg, S. (2009, September 03). Energy rating of different glazings for Indian
climates. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0360544209003521
5. Ahmad, I. (2009, August 19). Performance of antisolar insulated roof system. Retrieved
October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0960148109003310
6. Ciampi, M., Leccese, F., & Tuoni, G. (2004, September 21). Energy analysis of ventilated
and microventilated roofs. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0038092X0400218X
7. How does COMcheck calculate percentage of glazing? (n.d.). Retrieved October 24, 2020,
from https://www.energycodes.gov/resource-center/faqs/how-does-comcheck-calculate-
percentage-glazing
8. Martha G. VanGeemCTLGroupRevised and updated by Ryan M. ColkerSustainable
Buildings Industry Council. (n.d.). Energy Codes and Standards. Retrieved October 24,
2020, from https://www.wbdg.org/resources/energy-codes-and-standards
9. Varley, J. (2019, September 18). ASHRAE 90.1 Vs. IECC - Is it Time to Take a Stand?
Retrieved October 24, 2020, from https://www.esmagazine.com/articles/99780-ashrae-
901-vs-iecc-is-it-time-to-take-a-stand
10. Amasyali, K., & El-Gohary, N. (2017, September 11). A review of data-driven building
energy consumption prediction studies. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S1364032117306093
11. Amasyali, K., & El-Gohary, N. (2017, September 11). A review of data-driven building
energy consumption prediction studies. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S1364032117306093
12. Mukhopadhyay, J., Baltazar, J., Kim, H., & Haberl, J. (1970, January 01). Comparison of
ASHRAE Standard 90.1, 189.1 and IECC Codes for Large Office Building in Texas),
92
Energy Systems Laboratory, Texas A&M University. Retrieved October 24, 2020, from
https://oaktrust.library.tamu.edu/handle/1969.1/152108
13. Steps to Develop a Baseline: A Guide to Developing an Energy Use and Energy Intensity
Baseline and the Reporting Requirements for the Save Energy Now LEADER Pledge.
(2006). doi:10.2172/862304
14. Han, J., Lu, L., & Yang, H. (2008, December 30). Investigation on the thermal performance
of different lightweight roofing structures and its effect on space cooling load. Retrieved
October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S1359431108004936
15. Chan, K., & Chow, W. (1999, March 04). Energy impact of commercial-building
envelopes in the sub-tropical climate. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S030626199800021X
16. Gugliermetti, F., Passerini, G., & Bisegna, F. (2003, October 14). Climate models for the
assessment of office buildings energy performance. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0360132303001380
17. Fathi, S., Srinivasan, R., Fenner, A., & Fathi, S. (2020, September 02). Machine learning
applications in urban building energy performance forecasting: A systematic review.
Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S136403212030575X
18. Deng, H., Fannon, D., & Eckelman, M. (2017, December 17). Predictive modeling for US
commercial building energy use: A comparison of existing statistical and machine learning
algorithms using CBECS microdata. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0378778817327834
19. Azadeh, M. A., & Sohrabkhani, S. (2006). Annual Electricity Consumption Forecasting
with Neural Network in High Energy Consuming Industrial Sectors of Iran. 2006 IEEE
International Conference on Industrial Technology. doi:10.1109/icit.2006.372572
20. Dong, B., Cao, C., & Lee, S. (2004, November 26). Applying support vector machines to
predict building energy consumption in tropical region. Retrieved October 24, 2020, from
https://www.sciencedirect.com/science/article/pii/S0378778804002981
21. Bano, F., & Kamal, M. (2016). Examining the Role of Building Envelope for Energy
Efficiency in Office Buildings in India. Examining the Role of Building Envelope for
Energy Efficiency in Office Buildings in India.
22. Yoshino, H., Hong, T., & Nord, N. (2017). IEA EBC annex 53: Total energy use in
buildings—Analysis and evaluation methods. Energy and Buildings, 152, 124-136.
doi:10.1016/j.enbuild.2017.07.038
23. Local Representatives. (2019, October 31). Retrieved November 21, 2020, from
https://www.iccsafe.org/about/overview/international-code-adoptions/
93
24. Hoare, J. (2020, November 19). How is Variable Importance Calculated for a Random
Forest? Retrieved November 25, 2020, from https://www.displayr.com/how-is-variable-
importance-calculated-for-a-random-forest/
25. Stephanie. (2019, January 20). Stepwise Regression. Retrieved November 28, 2020, from
https://www.statisticshowto.com/stepwise-regression/
26. Löcher, A. (2020, September 08). Variable Importance in Random Forests. Retrieved
November 28, 2020, from https://blog.hwr-berlin.de/codeandstats/variable-importance-in-
random-forests/
27. Frost, J., Denise, Troy, Saroja, Cess, Ba;a, . . . Kns. (2019, June 13). Guide to Stepwise
Regression and Best Subsets Regression. Retrieved November 28, 2020, from
https://statisticsbyjim.com/regression/guide-stepwise-best-subsets-regression/
28. DesignBuilder SBEM v4.7 approved for England and Scotland EPCs and Part-L2 / Section
6 2015. (n.d.). Retrieved November 28, 2020, from https://designbuilder.co.uk/38-
designbuilder-latest-news
29. Random forest. (2020, October 25). Retrieved November 28, 2020, from
https://en.wikipedia.org/wiki/Random_forest
30. Sola, A., Corchero, C., Salom, J., & Sanmarti, M. (2020). Multi-domain urban-scale
energy modelling tools: A review. Sustainable Cities and Society, 54, 101872.
doi:10.1016/j.scs.2019.101872
94
APPENDIX
IBM SPSS Stepwise Regression Results
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Weka J48 Decision Tree Results
112
113
114
115
116
117
118
119
Design Builder Simulation Results based on Rankings
Regression Results
120
121
Decision Tree Ranking
122
Building Designer Ranking
123
124
Abstract (if available)
Abstract
Although there has been widespread use of building performance simulation and energy management technologies, buildings continue to raise environmental and energy resource issues. Considering our goals for Architecture 2030, the push for finding new innovative strategies to lower carbon emissions is important now more than ever. Considering also that the existing building energy guidelines are backdated and generalized and also lack coordination between them which created confusion, an update in this regard would be useful. ❧ The research follows a data analysis approach using large amounts of existing building data to explore the relationships of building components and it’s impacts on energy usage. Using both statistical and machine learning techniques, building components are ranked according to their energy consumption impacts. ❧ Prioritizing these critical components, a more tailored building energy guideline is to be formulated. Models are simulated using building simulation software to validate this argument. The proposed guideline should clear current standards and propose a more specific and efficient guideline.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Building energy performance estimation approach: facade visual information-driven benchmark performance model
PDF
Developing environmental controls using a data-driven approach for enhancing environmental comfort and energy performance
PDF
Development of data-driven user-centered building façade design guideline models: machine learning-based approaches to predict user preferences
PDF
Energy use intensity estimation method based on building façade features by using regression models
PDF
Facade retrofit: enhancing energy performance in existing buildings
PDF
Energy simulation in existing buildings: calibrating the model for retrofit studies
PDF
Double skin façade in hot arid climates: computer simulations to find optimized energy and thermal performance of double skin façades
PDF
Impact of occupants in building performance: extracting information from building data
PDF
BIM+AR in architecture: a building maintenance application for a smart phone
PDF
Energy efficient buildings: a method of probabilistic risk assessment using building energy simulation
PDF
Impacts of building performance on occupants' work productivity: a post occupancy evaluation study
PDF
Digital tree simulation for residential building energy savings: shading and evapotranspiration
PDF
Development of AI-driven architectural design guidelines: establishing human biometric signal-driven architectural design guideline as a function of psychological principles
PDF
A proposal for building envelope retrofit on the Bonaventure Hotel: a case study examining energy and carbon
PDF
Environmental adaptive design: building performance analysis considering change
PDF
Net zero energy building: the integration of design strategies and PVs for zero-energy consumption
PDF
Bridging performance gaps by occupancy and weather data-driven energy prediction modeling using neural networks
PDF
Streamlining sustainable design in building information modeling: BIM-based PV design and analysis tools
PDF
A parametric study of the thermal performance of green roofs in different climates through energy modeling
PDF
Exploration for the prediction of thermal comfort & sensation with application of building HVAC automation
Asset Metadata
Creator
Khan, Muntaseer
(author)
Core Title
An analysis of building component energy usage: a data driven approach to formulate a guideline
School
School of Architecture
Degree
Master of Building Science
Degree Program
Building Science
Degree Conferral Date
2021-05
Publication Date
05/09/2021
Defense Date
05/07/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
building components,building energy guideline,energy simulation,machine learning,OAI-PMH Harvest,statistical analysis
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Choi, Joon-Ho (
committee chair
), Chiang, Yao-Yi (
committee member
), Schiler, Marc (
committee member
)
Creator Email
mmkhan@usc.edu,muntaseer.khan99@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112720104
Unique identifier
UC112720104
Identifier
etd-KhanMuntas-9614.pdf (filename)
Legacy Identifier
etd-KhanMuntas-9614
Document Type
Thesis
Format
application/pdf (imt)
Rights
Khan, Muntaseer
Type
texts
Source
20210510-wayne-usctheses-batch-836-shoaf
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
building components
building energy guideline
energy simulation
machine learning
statistical analysis