Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Office floor plans generation based on Generative Adversarial Network
(USC Thesis Other)
Office floor plans generation based on Generative Adversarial Network
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
OFFICE FLOOR PLANS GENERATION BASED ON GENERATIVE ADVERSARIAL
NETWORK
by
Zhequan Zhang
A Thesis Presented to the
FACULTY OF THE USC SCHOOL OF ARCHITECTURE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF BUILDING SCIENCE
August 2021
Copyright 2021 Zhequan Zhang
ii
ACKNOWLEDGEMENTS
First of all, I want to express my most sincere gratitude to my chairman of the committee, Professor
Karen Kensek. It was her who helped me through the many challenges in the initial idea and the
process of writing the thesis. This paper would not have been completed without her. I thank her
for giving me the opportunity to obtain the data set, her patience, and shaking my chair online to
urge me. I also want to thank my other two committee members, Professor Joon-Ho Choi and
William Carney. Thank them for their time and suggestions for the direction of my thesis.
Professor Choi help me on office design background research and machine learning.The data set
was provided by William Carney, who also ran the dynamo code I wrote for me to derive the
training set. The export process of each model is very time consuming. This paper cannot be
completed without the data set provided by William.
I am also very grateful to Professor Mark Schiler. I once wanted to give up. It was his
encouragement and help that made me through the most difficult semester of my life.
I want to acknowledge the help of Peofessor Zheng Hao from UPenn, thank him for providing his
previous research data set for my practice. Thanks to alumni Zhihe Wang and Zhiying Liu for their
help on Dynamo code. Without their suggestions, my code may take more time to complete.
I would like to express my deep gratitude to my whole family, who have been fully supportive.
Thanks to my girlfriend, Zhou Ye, who accompanied me through the whole year of the epidemic.
She comforted me when my paper encountered unknown difficulties, and took the initiative to
analyze the problems I encountered with me and give her suggestions. And also thank her for
taking care of my life, thank you.
Because of the epidemic, this whole year did not happen according to my dream life at the
University of Southern California. I used to feel confused and painful, but no longer. I am more
happy and grateful for the help from so many people. I was able to complete this thesis because of
you.
iii
TABLE OF CONTENTS
Acknowledgements ............................................................................................................................................... ii
Table of Contents ................................................................................................................................................. iii
List of Figures. ...................................................................................................................................................... v
Abstract ................................................................................................................................................................ xi
Chapter 1: Introduction ........................................................................................................................................ 1
1.1 Commercial Office ................................................................................................................. 1
1.2 BIM ........................................................................................................................................ 3
1.3 Machine Learning .................................................................................................................. 9
1.4 Neural Network .................................................................................................................... 10
1.5 Convolutional Neural Network ............................................................................................ 11
1.6 Generative Adversarial Network – Pix2pix & Pix2pixHD .................................................. 21
1.7 Summary .............................................................................................................................. 25
Chapter 2: Background Research ........................................................................................................................ 26
2.1 AI & Architecture ................................................................................................................ 26
2.2 BIM Customization — IMAGINIT Clarity ......................................................................... 29
2.3 Automated Room Layout By GANs .................................................................................... 32
2.4 Summary .............................................................................................................................. 38
Chapter 3: Methodology ..................................................................................................................................... 39
3.1 Theoretical methodology .................................................................................................... 39
3.2 Actual Methodology Used .................................................................................................. 56
3.3 Summary ............................................................................................................................. 58
Chapter 4: Detailed Actual Methodology ........................................................................................................... 61
4.1 Revit Data to Images ............................................................................................................ 61
4.2 Build Training Set by Photoshop ......................................................................................... 74
4.3 Future Work Application ..................................................................................................... 84
4.3 Summary .............................................................................................................................. 93
Chapter 5: Detailed Training Process ................................................................................................................. 95
5.1 Training ................................................................................................................................ 96
5.2 Test ..................................................................................................................................... 109
5.3 Summary ............................................................................................................................ 113
Chapter 6: Conclusion,Discussionand Future Work ........................................................................................ 115
iv
6.1 Context ............................................................................................................................... 115
6.2 Methodology ...................................................................................................................... 117
6.3 Limitations ......................................................................................................................... 122
6.4 Future Work ....................................................................................................................... 124
6.4 Conclusion ......................................................................................................................... 126
References ......................................................................................................................................................... 128
Appendices ....................................................................................................................................................... 131
Appendix A: Pix2pixHD code analysis train.py ....................................................................... 131
Appendix B: Pix2pixHD code analysis test.py ......................................................................... 138
v
LIST OF FIGURES
Figure 1-1: Effect of office space layout(Du et al., 2020) ...................................................... 2
Figure 1-1: Properties of elements in Revit ............................................................................ 5
Figure 1-2: Category, Faimily, and Type of the all ................................................................ 6
Figure 1-3: Dynamo nodes for getting a plan drawing ........................................................... 7
Figure 1-4: Auto generation of section view .......................................................................... 8
Figure 1-5: Dynamo nodes of Parametric Design Bridge ....................................................... 8
Figure 1-6: Parametric Design Bridge .................................................................................... 9
Figure 1-7: Façade that changes with the angle of sunlight created in Dynamo .................... 9
Figure 1-8: Machine Learning Subcategory ......................................................................... 10
Figure 1-9: CNN decide whether it is an X or O .................................................................. 12
Figure 1-10: The X and O in the graphics ............................................................................ 13
Figure 1-11: CNN convolution difference ............................................................................ 13
Figure 1-12: 3x3 filter ........................................................................................................... 13
Figure 1-13: Filter ................................................................................................................. 14
Figure 1-14: Multiplication result chart one by one ............................................................. 14
Figure 1-15: Different patch with the same filter ................................................................. 15
Figure 1-16: Feature map ...................................................................................................... 15
Figure 1-17: Two filters’ feature maps ................................................................................. 16
Figure 1-18: Archieving maximun in the feature map patch ................................................ 17
Figure 1-19: Rectified liner units .......................................................................................... 17
Figure 1-20: Feature map after ReLU unit ........................................................................... 18
Figure 1-21: Convolution activation to output structure diagram ........................................ 18
Figure 1-22: Every value gets a vote .................................................................................... 19
Figure 1-23: Certain value has strong weight for the X outcome ......................................... 19
Figure 1-24: Result sample of fully connected layer ............................................................ 20
Figure 1-25: GAN trainning process ..................................................................................... 22
Figure 1-26: Image-generation type GAN ............................................................................ 23
Figure 1-27: Example of image translation .......................................................................... 24
Figure 1-28: Encoder-decoder & U-net ................................................................................ 24
Figure 1-29: U-net detail ....................................................................................................... 24
Figure 2-1: IMAGINiT ......................................................................................................... 29
Figure 2-2: Project Query interface ...................................................................................... 30
vi
Figure 2-3: Project selection ................................................................................................. 30
Figure 2-4: Selection interface .............................................................................................. 30
Figure 2-5: The spreadsheet of extracted data ...................................................................... 31
Figure 2-6: SQLQuery .......................................................................................................... 31
Figure 2-7: Data sheet of one room ...................................................................................... 32
Figure 2-8: Edit room’s data ................................................................................................. 32
Figure 2-9: Pix2pix translate the image of room shape to room floor plan .......................... 33
Figure 2-10: Training set processing .................................................................................... 34
Figure 2-11: Training process result from different epoch ................................................... 34
Figure 2-12: Chinese apartment ............................................................................................ 35
Figure 2-13: Stacked Pix2pix model ..................................................................................... 36
Figure 2-14: Specific-area to floor-shape translation ........................................................... 36
Figure 2-15: Part of training result from Model II ................................................................ 36
Figure 2-16: Training results from Model III ....................................................................... 37
Figure 3-1: Theoretical methodology ................................................................................... 39
Figure 3-1: Exported images ................................................................................................. 40
Figure 2-3: Full Dynamo code .............................................................................................. 41
Figure 3-4: Partial Dynamo code .......................................................................................... 41
Figure 3-5: Extracting elements by category in Dynamo ..................................................... 42
Figure 3-6: Subset of code to find offices ............................................................................. 42
Figure 3-7: Subset of Dynamo code to finish selecting a room ............................................ 43
Figure 3-8: Dynamo code to crop the view .......................................................................... 43
Figure 3-9: Current output of code ....................................................................................... 44
Figure 3-10: sample color block images ............................................................................... 45
Figure 3-11: Portion of Dynamo code for setting parameters .............................................. 46
Figure 3-12: Portion of code for setting visual style ............................................................. 46
Figure 3-14: Code for setting color for windows and curtain walls ..................................... 47
Figure 3-15: Hide elements in the views .............................................................................. 48
Figure 3-16: Sample image of doors and windows converted to colored rectangles ........... 48
Figure 3-17: Part of Excel database showing names of furniture in rooms .......................... 49
Figure 3-18: Part of code for filtering the furniture .............................................................. 49
Figure 3-19: Python script to override the color of an object ............................................... 50
Figure 3-20: Result of Python code in Revit ........................................................................ 51
vii
Figure 3-21: Doors, windows, and furniture set to color blocks .......................................... 51
Figure 3-22: Node export image by View in ArchiLab package .......................................... 51
Figure 3-23: Result of Dynamo code .................................................................................... 52
Figure 3-24: Original and final images set to same standard size and aspect ratio. ............. 52
Figure 3-25: Paperspace platform startup interface .............................................................. 53
Figure 3-26: Rent Service Desktop ....................................................................................... 53
Figure 3-27: The hyper-parameters setting in Shell language .............................................. 54
Figure 3-28: Each group from left to right: Group 1(input), Group 2(output answer), output
from the generator ......................................................................................................... 55
Figure 3-29: G&D trained model in everyeach 10 epoch ..................................................... 55
Figure 3-30: Methodology this step is difficult to do ........................................................... 56
Figure 3-31: Methodology used ............................................................................................ 56
Figure 3-33: Detailed Difference difference in Theoretical theoretical Methodology
methodology (above) and Actual actual Methodologymethodology (below) .............. 58
Figure 3-34: Actual methodology used ................................................................................. 59
Figure 4-5: Actual methodology ........................................................................................... 61
Figure 4-6: Actual methodology step 1 ................................................................................ 61
Figure 4-7: Dynamo code in 2 files ...................................................................................... 62
Figure 4-8: Property interface in Revit with Parameter Room Bounding ............................ 62
Figure 4-9 : (Left to right) 1. Room in Revit 2. Boundary line of the chosen room in Dynamo
without operation of room bounding 3. Error from unclosed boundary for the view
creation .......................................................................................................................... 63
Figure 4-10 : Whole nodes for changing room bounding of columns .................................. 63
Figure 4-11: Part II: Export views ........................................................................................ 64
Figure 4-12: Dynamo code for creating the view ................................................................. 64
Figure 4-13: (Left to right) one keyword, one-keyword list, two-keyword list .................... 65
Figure 4-14: Booleans.AnyTrue ........................................................................................... 65
Figure 4-15: The list order is kept ......................................................................................... 66
Figure 4-16: Result from step 1 ............................................................................................ 67
Figure 4-17: Hide auxiliary lines .......................................................................................... 67
Figure 4-18: View.HideElements ......................................................................................... 68
Figure 4-19: Part I ................................................................................................................. 68
Figure 4-20: The "line" is a section view line ....................................................................... 69
Figure 4-21: Part II ................................................................................................................ 69
viii
Figure 4-22: List structure .................................................................................................... 70
Figure 4-23: Keep list structure in List.Equals ..................................................................... 70
Figure. 4-24: Result from steps 1 and 2 ................................................................................ 70
Figure 4-25: Visual Style Select ........................................................................................... 71
Figure. 4-26: Result from steps 1 through 3 ......................................................................... 71
Figure 4-27: Step4 - views output ......................................................................................... 72
Figure 4-28: Subset of 243 images (upper) and zoomed in view of three of them (lower) .. 73
Figure 4-29: Actual methodology part 2 ............................................................................... 74
Figure 4-30: Entire process of part 2 in detail (top); lower two images are zoomed in versions
of the top complete image ............................................................................................. 75
Figure. 4-31: Actions window in Photoshop ........................................................................ 76
Figure 4-32: Creating action set: Resized ............................................................................. 76
Figure 4-33: Actions is recording ......................................................................................... 77
Figure 4-34 The image layer is initially closed .................................................................... 77
Figure 4-35 Image 1 shrink in the original size background ................................................ 78
Figure 4-36:Change size from Original size to 1024*1024 .................................................. 78
Figure 4-37 Option of Saving as JPG image ........................................................................ 79
Figure 4-38 Actions window after recording all operations ................................................. 79
Figure 4-39: Batch running ................................................................................................... 80
Figure 4-40 Results from step 1 (Group3) ............................................................................ 80
Figure 4-41: Open image by Photoshop ............................................................................... 81
Figure 4-42: Change path to shape ....................................................................................... 81
Figure 4-43: Draw the shape ................................................................................................. 81
Figure 4-44: Set color ........................................................................................................... 82
Figure 4-45: Images of Group 1 ............................................................................................ 82
Figure 4-46: Color the furniture ............................................................................................ 82
Figure 4-47: Rotated and mirrored operations records by actions window .......................... 83
Figure 4-48: Part of Groups 1,2, & 3 (upper) and zoomed in views of six sets (lower) ....... 84
Figure. 4-49: Future work methodology diagram ................................................................. 84
Figure. 4-50: Future work process ........................................................................................ 85
Figure. 4-51: Future work Dynamo code .............................................................................. 86
Figure 4-52: Dynamo: image Input ....................................................................................... 86
Figure 4-53: Dynamo: pixel sampling .................................................................................. 87
ix
Figure 4-54: Coordinated from the list ................................................................................. 87
Figure 4-55: Dynamo code for center coordinate of Blue .................................................... 88
Figure 4-56: Vector of furniture movement .......................................................................... 88
Figure 4-57: Center of the diagonal ...................................................................................... 89
Figure 4-58: Color filter ........................................................................................................ 89
Figure 4-59: Customized Python code and its output list ..................................................... 90
Figure 4-60: Center by max and min points ......................................................................... 90
Figure 4-61: Get Door Coordinate ........................................................................................ 91
Figure 4-62: Door’s Revit data location(upper) and door’s images location ....................... 91
Figure 4-63: Calculation ....................................................................................................... 92
Figure 4-60: 3D Revit office ................................................................................................. 92
Figure 5-64: 3 Groups images from Chapter 4 (top); larger views of two examples for the
three sets........................................................................................................................ 95
Figure 5-65: Training sets for 3 models ................................................................................ 96
Figure 5-66: Nested model compared to Model 3 ................................................................ 96
Figure 5-67: Training process of pxi2pixHD ........................................................................ 97
Figure 5-68: Screen shot from the Linux terminal ................................................................ 97
Figure 5-69: Graphic card check by !nvidia-smi .................................................................. 98
Figure 5-70: parts of hyperparameter in the folder ............................................................... 98
Figure 5-71: Training instructions ........................................................................................ 99
Figure 5-72:Training set detection ........................................................................................ 99
Figure 5-73: In every group:(from left to right) Input, Synthesized, Ground truth ............. 99
Figure 5-74: Result from Model 1 Ver 2 epoch 1, .............................................................. 100
Figure. 5-75: Screen shot from Linux terminal .................................................................. 101
Figure 5-76: Loss Value for every 100 images: X axis: Loss value, Y axis: epoch ........... 102
Figure 5-77: One random generated image from every epoch in model 1_2 (above); enlarged
view of training start and end showing success as the images have gotten less blurry
(below) ........................................................................................................................ 103
Figure 5-80: One random .................................................................................................... 106
Figure 5-81: Model 3 .......................................................................................................... 107
Figure 5-82: Results of Model 3 from epoch 50,100,150,200 ............................................ 107
Figure 5-83: Learning rate (Jordan, 2018) .......................................................................... 108
Figure 5-84 Nested Model by Model 1 and Model 2 .......................................................... 109
x
Figure 5-85 Part of test set for Model 1 .............................................................................. 109
Figure 5-86: Test hyperparameter setting ........................................................................... 110
Figure 5-87: Input images and its synthesized images from test of Model 1 ..................... 111
Figure 5-88: Input, Synthesized, Ground truth ................................................................... 111
Figure 5-89 Nested process test .......................................................................................... 112
Figure 5-90: The area around the color block is blurry in Model 1 synthesized images .... 112
Figure 5-91:New office room shape synthesized images has low image quality ............... 113
Figure 6-92: One sample for whole process ....................................................................... 115
Figure 6-2: Apartment floor plan generation by Pix2pix (Zheng, 2018) ............................ 116
Figure 6-3: Nested model for apartment floor plan generation (Chaillou,2019) ................ 117
Figure 6-4: Actual Methodology ........................................................................................ 118
Figure 6-5: Training sets for Model 1, 2 & 3 ...................................................................... 118
Figure 6-6: 3 Groups images from Chapter 4 (top); larger views of two examples for the three
sets............................................................................................................................... 119
Figure 6-7: One random generated image from every epoch in model 1_2 (above); enlarged
view of training start and end showing success as the images have gotten less blurry
(below) ........................................................................................................................ 120
Figure 6-8: Nested model v.s Model 3 ................................................................................ 121
Figure 6-9: Test process of nested model ........................................................................... 121
Figure 6-10: Synthesized image to Revit 3D model ........................................................... 122
Figure 6-11: Model 1 blur synthesized image causes door was placed in the wall ............ 123
Figure 6-12: L-shape table in synthesized image ............................................................... 125
xi
ABSTRACT
Driven by machine learning and faster computers, automation and artificial intelligence (AI) will
profoundly impact many industries. From self-driving trucks, personal assistants can manage
schedules and financial management applications that replace personal accounting. By
systematizing repetitive tasks, automation can improve cost savings, reliability, and productivity.
At the same time, it allows humans to focus on more high-value and complex tasks.
Machine learning provides opportunities for automation in the building design process. A new
type of machine learning model, Generative Adversarial Network (GAN), makes it possible to
generate floor plans automatically.
Based on the BIM data set of an architecture firm, Dynamo (a visual programming language in the
Revit) was used to obtain a training set and a test set --- two groups of the paired pictures. one type
of the GAN called pix2pixHD were trained based on the training set, which is a process of learning
the relationship between two paired image. A floor plan with all facilities locations would be
predicted by given a certain shape of the room. Then the predicted result is reversed and input back
to the rooms with facilities in Revit.
Unforeseen problems led to workarounds in the process. The training set could not be fully
automated and exported by Dynamo from the BIM database; it required manual operation with the
assistance of Photoshop. The quality of the final training set was not as high as hoped for, but the
model can still predict the layout of the office. The function of placing furniture in the Revit model
was applied to a simple office room, but could not be applied to a more complex office. The
theoretical process can be achieved, but with more work. The actual process used did provide a
desired outcome using machine learning to create office plans.
Research Objectives
• To generate office rooms’ plan view images (training set) in the same format by
Dynamo in Revit.
xii
• To generate a prediction of furniture layout images (office) by Generative Adversarial
Network (pix2pixHD).
• To create new 3d Revit offices.
• To learn the limitations of using Dynamo and machine learning for an architecture
project.
KEYWORDS: Machine learning, GANs, Pix2pixHD, BIM, Revit, Dynamo
1
1. INTRODUCTION
As a way to achieve artificial intelligence, machine learning has had a profound impact on many
industries. Usually, machine Learning is described as a subset of Artificial Intelligence. Artificial
intelligence is an umbrella term for a computer doing anything human-like. Deep Learning is a
subset of machine learning, and machine learning is a subset of Artificial intelligence. Machine
learning is a technology that uses programming to give computers the ability to learn: by learning
given data (training set), it can predict changes in data or classify data (Li, 2019). With the increase
in computer processing speed, computers have begun to run very complex machine learning
algorithms. This improvement on computers also allows machine learning to meet or exceed
human performance in a series of tasks. This opens up the possibility of automation in all industries.
Although only less than 5% of occupations can be fully automated using current technology, at
least 30% of the operations in most fields (60%) can be automated (James et al., 2017).
Artificial intelligence provides opportunities for automation in the building design process
(Chaillou, 2019). A new type of machine learning model, Generative Adversarial Network (GAN),
makes it possible to automatic floor plan generation (Zheng et al., 2017).
Simultaneously, building information modeling (BIM), which contains a large number of
parameters of the building, has created opportunities for introducing the use of machine learning
into architecture. Required data needs to be extracted and organized, such as a floor plan or energy
consumption spreadsheet, which can be used as a training set for machine learning.
DLR group architecture and engineering design company provided a large number of office BIM
data. The floor plans were extracted from Revit (a BIM software) to become a training set for
machine learning. Through a training machine learning model, The function of automatically
generating office floor plans can be realized.
To get a office layout images automatic generation tool based on machine learning, it is necessary
to understand the office layout design, the application of building information modeling software,
and the conception and technologies of artificial intelligence.
1.1 Commercial Office
The office room’s layout has connections with office workers’ productivity and their relationship.
The increase in productivity can be achieved if facility manager can make a balance between
2
private space and public shared space for the office environment. And the balance largely depends
on the mix of office work patterns (Barry P, 2008).
An optimal match between work process and office environment patterns can be made by
correspondingly different physical environments (Fig. 1-1). The hive office is characterized by
individual working routine with low levels of interaction between co-workers. The cell office is
highly concentrated with little interaction. The den office always used for group process work,
with certain level interaction. The club office is both highly autonomous and highly interactive
(Laing et al., 1998). The combined office is suitable for mostly composed of concentrated personal
work with spontaneous communication requirement.
Figure 1-1: Effect of office space layout(Du et al., 2020)
Office workers want to be able to perform solo work without interference and want to cherish the
opportunity for two-way interaction with colleagues. But an overly open office will cause people
to interact informally throughout the day, leading to loss of personal and team performance. Open
and shared offices are most complained about the loss of privacy and annoying interruptions. High-
density offices that make people feel crowded will make simple tasks easier and complex tasks
more difficult.
The goal of a high-performance workplace is to match the physical environment of the work to the
individual's work needs (Barry P, 2008). Once a clearer work pattern classification and preferred
work style are confirmed, people can accordingly set up a positive office environment based on
the work pattern.
3
1.2 BIM
The concept of building information modeling (BIM) has existed for nearly 40 years (Lee, 2008).
It is an advanced three-dimensional digital building model. Throughout the life cycle of a building
or facility, it provides a collaborative platform for personnel in all aspects of the building to create
and manage digital information of the entire building. From this platform, quality and efficiency
of planning, design, construction, operation and maintenance phases can be improved (Kensek,
2014). BIM can provide continuous and real-time project design scope, construction progress and
cost information, and can continuously update information in the digital environment and provide
users with access to these information, which is complete, reliable, and fully coordinated.
Throughout the process, BIM uses three-dimensional, real-time and dynamic models to contain all
the information, including geometry, space, geographic information and natural information of
building components (Cristina, 2019).
BIM has realized the conversion of architectural design tools from 2D to 3D. It optimizes
traditional computer-aided design (CAD) to generate two-dimensional (2D) drawings of objects,
such as buildings, electrical layouts, mechanical parts, etc. (Kensek, 2014). However, BIM is more
than just 3D modeling. Unlike computer-aided design (CAD), it also contains accurate virtual
models of virtual geometry and related information parameters. BIM provides an integrated
building information database which contains digital information composed of 3D parameter
objects for the construction, engineering, construction, and operations industry (AECO). These
parameters can be adjusted and called in complex ways through the interface between the program
and the 3D component (Kensek, 2014).
BIM is a tool supported by digital technology, which collects and shares all information and data
related to the life cycle of a building. And it successfully changes the working process of the
construction, engineering, and building industries (Liu, 2018).
Throughout the building's entire life cycle, BIM can integrate all project-related information
through parameter models. All this information is automatically shared and transferred through
every step of planning, operation, and maintenance. This allows architects, engineers, and other
professional consultants to understand and make an effective response to building information. It
can significantly improve the entire building process's efficiency and reduce various risks (Xu et
al., 2014).
4
Coordination is the most widely-used application of BIM before and during construction (Liu,
2018). If any problems appear in implementation, owners, architects, engineers, and contractors
who are involved in the project, will get together and find the cause and solution. Poor
communication between different accusations leads to collision problems in other parts (Zhang,
2015). One example is the HVAC pipeline layout. In the actual construction process, there may be
structural components that hinder the pipeline layout.
For this reason, BIM coordination services are often applied in the process of solving collision
problems. Besides that, BIM can simulate actual construction process to solve possible problems
in the early stages of design rather than in the later stages of actual construction (Meadati. 2008).
It can be used not only as a practical guide for later construction, but also as a feasibility guide,
because it can provide a reasonable construction plan, the allocation of personnel and materials,
and maximum rational use of resources (Bouazza, Udeaja & Greenwood, 2015).
In the architectural design process, BIM not only contains building information; when information
changes, it also records the corresponding feedback, which brings the user a new designing
method-parametric design (Liu, 2020). Parametric design is a digital modeling process based on
many standardized pre-programmed rules or algorithms called “parameters” (BIM Wiki, 2019).
The appearance of BIM technology can make the design more diversified. For some complex
designs, BIM plays a very appropriate role in the guidance. By changing the parameters, users can
change the architectural form. BIM can list different performance analysis forms and design
comparisons to provide designers with choices to choose the best option. Besides, the software
program can help architects deal with complex and repetitive tasks. Therefore, architects can spend
more time on developing designs to improve quality (Liu, 2016).
With the advent of BIM, users can integrate and manage more information simultaneously, such
as project time, program control, project budget, etc. (Lin et al., 2013). To help users monitor and
inspect the design and construction phases, virtual building components can be used to display an
actual building's appearance. BIM can also provide a performance analysis, such as daylight
analysis and simulation, airflow analysis, thermal energy analysis, and noise analysis, to create
sustainable buildings with high performance but low energy consumption. After the construction
is completed, BIM can also track various equipment in the building to make operation and
maintenance more real-time and accurate (Li, 2014).
5
1.2.1 Revit
Autodesk Revit is a specific BIM software program that allows architects and other building
specialists to design and document buildings by creating parametric 3D models containing
geometric and non-geometric design and building information. It contains exactly all components
and elements, including 3D geometric shapes and textures and the entire digital equivalent of
actual building parts. These components include geometry and data, which theoretically could
cover all the information required during construction. For example, it displays all relevant
information about the window (Fig. 1-2).
Figure 1-1: Properties of elements in Revit
“Family” in Revit is an element including a common parameter set name and meaning and relative
graphical representations. In one family, parameters of different elements belong to the same set,
but these parameters may have different values. And the group of these elements under a specific
"family" is called "family type" ,or simply called “type” (Wang, 2020).
6
Figure 1-2: Category, Faimily, and Type of the all
Family is nessary for Revit users to create objects (Autodesk, 2016). The family contains non-
graphical element attributes information. When an element is selected, the properties palette will
display the instance attributes which include information related to the particular family instance.
And the family instance is an actual element in the project. It owns a specific location and ID in
the model (Wang, 2020). When users need to create a new type, they can simply copy an existing
type and change the old attributes to a new one. Families can be objects like windows or even
views like perspectives.
An experienced Revit user can import existing models from other programs, as well as create
realistic and accurate families from lighting fixtures to furniture. Revit families can be created as
parametric models with dimensions and attributes. By changing predefined parameters such as
height, width, or number, Revit can allow users to modify a given component. In this way, a family
defines a geometry that is controlled by parameters. Each combination of parameters can be saved
as a type, and each occurrence of a type can also contain further variations.
Like other BIM software, Revit is also programmable. The users can create a parametric
component by using a graphical “family editor” instead of complex direct coding, and the models
in Revit can connect all the relationships between views, components and annotations in order to
automatically propagate changes to any element to maintain the model connected and all
documentation coordinated (Boeyken, 2012). Revit API (Application Programming Interface) or
Dynamo for programming custom features into Revit (may add more).
7
1.2.2 Dynamo
Dynamo is a visual programming tool that is usually used together with Revit. Dynamo is a visual
based programming language for accessing the Revit API (Application Programming Interface)
by manipulating and combining Dynamo nodes (Autodesk, 2019). Visual programming language
is widely used in other disciplines, and it has became an important supplement to 3D modeling
industries in the field of construction and engineering in the last several years.
The relationship between Dynamo and Revit is similar to the relationship between Grasshopper
and Rhino. Rhino is a 3d model program. Grasshopper is its visual programming language. The
difference is that Grasshopper is not used with BIM software. And Dynamo is used with REvitare
and able to reads the model parameters (Kensek, 2015). Dynamo extends the functionality of Revit
by using easy access to the Revit API in a more accessible method. Instead of text-based coding,
with Dynamo users can create programs by operating graphic elements called “nodes.” This is a
better method of programming for coders who prefer graphic interfaces or have little background
in programming. Meanwhile, Dynamo is also an open-source tooling that many useful packages
written by other people are open to share in the dynamo package library (Wang, 2020). Every
Dynamo user is encouraged to use existing nodes to combine other nodes to create new ones.
In Dynamo, each node has a specific assignment. Nodes have inputs and outputs. The output of
one node is connected to the input of another node by using a “wire.” The program or “graph”
flows from one node to another through the wires network. The result is a graphic representation
of the steps required to achieve the final design (Fig. 1-4).
Figure 1-3: Dynamo nodes for getting a plan drawing
Repetitive work is normal in some aspects of modeling and documentation. For example, the steps
of automatically generating the scene include extracting space, setting the position of the 3D
viewing angle, the viewing direction, and the range of viewing angles. With Dynamo, 3D
8
perspective view can be automatically generated (Fig.1-5). At the same time, additional functions
can be implemented, such as the regular naming of 3D images.
Figure 1-4: Auto generation of section view
Dynamo can not only automate tasks and export and import data, but it is also a powerful design
tool. With Dynamo, when user clearly define generative design rules, design truly becomes a
process. By encoding the rules in the computing framework user can use these rules to generate
hundreds (if not thousands) of options (Fig. 1-6 and 1-7).
Figure 1-5: Dynamo nodes of Parametric Design Bridge
9
Figure 1-6: Parametric Design Bridge
With Dynamo, building performance can be easily simulated throughout the design process.
Quickly determining which design performs better can help guide users in finding the best solution
(Fig.1-8). Computational design tools such as Dynamo provide a way to make certainty throughout
the design process, not only after the building is completed.
Figure 1-7: Façade that changes with the angle of sunlight created in Dynamo
1.3 Machine Learning
Artificial intelligence is an emerging technology science used to simulate and expand the theory,
method, technology, and application system of human intelligence (Arkoudas & Bringsjord, 2005).
Artificial intelligence (AI) is a branch of computer science. Research in this field includes robotics,
language recognition, image recognition, natural language processing, and expert systems, etc
(McPherson, 2018). The theory and technology of AI have become increasingly mature, and the
field of application has continued to expand.
AI is being used in the fields of autonomous driving, image recognition, and text content
recognition, enabling intelligent functions such as tools, software applications, and web pages.
However, artificial intelligence is just a term that makes it easy for the public to understand. The
technology that has achieved more development in recent years is deep learning. Artificial
intelligence, machine learning, and deep learning are inclusive relationships (Fig.x). Machine
learning is an important sub-topic of artificial intelligence.
10
Machine learning allows computers to acquire the ability to learn. Ordinary programming means
that humans need to manually input every instruction for the computer to execute, while machine
learning is like indirect programming, where humans only need to provide data (Molnar, 2020).
Machine learning is used for the prediction and analysis of data, especially the prediction and
analysis of unknown new data. The forecast for data can bring people discoveries. Under normal
circumstances, machine learning is divided into three categories, supervised learning and
unsupervised learning (Mitchell, 1997) (Fig. 1-9).
Figure 1-8: Machine Learning Subcategory
1.3.1 Supervised Learning
Supervised learning refers to the machine learning problem of learning predictive models from
labeled data. Annotated data represents the corresponding relationship between input and output,
and the prediction model produces corresponding output for a given input (Kotsiantis, 2007). For
example, in order to predict the house price, other past sales prices will be used as indicators for
a future price. For a computer, the machine learning model is fedtraining set, such as the price of
the houses, the locations, size and so on. In this case, all the given data is the labeled data for the
machine learning, which are related to changes in housing prices from data on housing prices. The
ultimate goal of supervised learning is to obtain a trained model with predictive function.
1.3.2 Unsupervised Learning
Unsupervised learning refers to the machine learning problem of learning predictive models from
unlabeled data (Li, 2018). In unsupervised learning, for example, there are many images of dogs
and cats without labels; the people didn’t category the training set for the machine learning mode.
The machine learning model has to figure out the difference between the cats and dogs by itself.
1.4 Neural Network
Neural network (NN) is a kind of machine learning model that simulates the neurons in human
brain in order to achieve artificial intelligence. When there are enough training samples and
11
computing capability, the neural network can simulate almost any function and answer almost any
question.
Each neuron of an artificial neural network can make simple decisions just like human brains.
These decisions are then transformed to the next neuron which organized in interconnected layers.
Neuron is the basic unit of neural network, it contains input, output and computation functions.
There are connections between the neurons, and each of this has a weight associated with it that is
fine turned over time.
The activation function is a mathematical equation that determines the output of a neuron. It is like
a switch that determines whether to trigger the input by whether the input is related to the
prediction of the model. The weight and activation function together determine the output of each
neuron. Training of the neural network is to adjust the value of the weight to optimal, so that the
prediction effect of the entire network is the best.
The learning process of a neural network is performed with the layers. Layer is the highest-level
module in deep learning. A layer receives input after weighted, calculates it with some functions,
neural network the "computing layer." A single-layer neural network has two layers, an input layer
and an output layer, but there is only one computing layer-the output layer. In addition to the input
layer and output layer, a multi-layer neural network has multiple intermediate layers. The
intermediate layers and the output layer are the computing layers of the multi-layer neural network.
1.5 Convolutional Neural Network
A convolutional neural network (CNN) will be used as the example to explain the operation of
neural network. Although CNN won’t be actually applied in this thesis, some ideas of CNN will
be applied. Understanding how CNNs generally work and the tricks behind the concepts are
helpful to understand GAN image process.
CNN is a powerful tool to realize the recognition of imagesthrough scanning the image one area
by one area and then identifying and extracting the important features. These features will be used
to classify the image. CNN is composed of four different functions of layers: convolution layer,
pooling layer, activation layer, and fully connected layer. These layers can appear more than one
time in a convolutional neural network.
CNN solves two main problems of AI image process:
12
1. The amount of image data to be processed is too large, resulting in high processing cost and low
efficiency.
2. It is difficult for the image to retain the original characteristics during the digitization process,
resulting in low accuracy of image processing.
A simple, but challenging example of CNN image processing, goes through the four different types
of layer. In the example, CNN takes in a two- dimensional array, only black or white pixels image
to decide whether it is a picture of an X or O (Fig.1-10).
Figure 1-9: CNN decide whether it is an X or O
The reason why it is challenging is because what computer see is a bunch of numbers. Unlike
human beings’ pattern recognition ability, the computer has to go through the image pixel by
pixel. The X and O in the graphics are not unique. Some of them will be rotated, some will be off
13
the center of the picture, and their thickness will be different (Fig 1-11). Human beings can still
judge X or O instantly in these cases. For computer, a different technique has to be applied.
Figure 1-10: The X and O in the graphics
1.5.1 Convolutional Layer
The first stage that a CNN operates is a convolution. In this stage, CNN breaks down the image
into smaller parts (patches), then uses the filters to match the patch. Then it becomes much clearer
whether these two things are similar(Fig 1-12).
Figure 1-11: CNN convolution difference
In this case, filters are little mini images, just 3 pixels by 3 pixels (Fig1-13).
Figure 1-12: 3x3 filter
14
Set -1 to black pixel and 1 to white pixel. The little filter on the left sides of the figure are diagonal
lines (Fig1-14). The middle one is a little X. The source of these small filters will be introduced
later.
Figure 1-13: Filter
The next step is called filtering. It is the math process behind the matching. What the filtering
process does is line up the filter and the image patch. Multiply each image patch pixel by the
corresponding filter pixel. Then add them up and divide by the total number of pixels in the filter.
To show the real process, take 2 positions in the X image to filter.
In the first pixel of the first position (Fig 1-15), Multiply one by one. The result is one. Continue
to step through pixel by pixel, multiplying them all by each other and because these are exactly
same, the answer is always 1 (Fig 1-15). When this process is done for this image patch, add up
all the result and divide by 9. The answer is 1. Then put the number one at the middle of the image
patch.
Figure 1-14: Multiplication result chart one by one
The same process will be applied in the second position. The second image patch is not exactly
same as the filter. The final answer is 0.55.
15
Figure 1-15: Different patch with the same filter
By moving the filtering process around to every possible position, we can get the final result (Fig
1-17). This is also called feature map. The numbers show that how well the filter is represented at
the position.
Figure 1-16: Feature map
There are two filters need to be processed at this example. Here is the final result (Fig 1-18).
0.11
0.11 0.11
0.11
0.11
0.11
0.11
0.11 -0.11 -0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11 -0.11
-0.11
-0.11
-0.11
-0.11
0.77 0.77
0.77 0.77
0.33
0.33
0.33
0.33
0.33 0.33 0.33 0.33 0.55
0.55
0.55
0.55
0.55
-0.33
-0.33 -0.33
-0.33
1.00
1.00
1.00
1.00
16
Figure 1-17: Two filters’ feature maps
This action of convolving an image with many features and creating a stack of filtered images is
called the convolution layer.
1.5.2 Pooling Layer
The second stage of CNN operates is pooling. Pooling is used to shrink the image stack. This layer
reduces the dimensionality of each filter while maintaining the most important information.
In this example, only show one example of max pooling is shown. For max pooling, a window’s
size (a 2 pixels by 2 pixels image in this case) should be decided at first. From each window, take
the maximum value. Then walk window across filtered images. After maxing pooling, the result
of that is a similar pattern but smaller (Fig 1-19). In this example, the seven by seven feature map
shrink to four by four, which is nearly half size of the original picture.
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
1
1
1
1
1
1
1
1
1
1
1
1
1
-1
-1 -1
-1
-1 -1
-1
-1 -1
-1 1
1
1
1
1
1
1
1
0.11
0.11 0.11
0.11
0.11
0.11
0.11
0.11 -0.11 -0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11 -0.11
-0.11
-0.11
-0.11
-0.11
0.77 0.77
0.77 0.77
0.33
0.33
0.33
0.33
0.33 0.33 0.33 0.33 0.55
0.55
0.55
0.55
0.55
-0.33
-0.33 -0.33
-0.33
1.00
1.00
1.00
1.00
0.11
0.11 0.11
0.11
0.11
0.11
0.11
0.11
-0.11 -0.11
-0.11
-0.11
0.33
0.33 0.33
0.33
-0.77
-0.77
-0.77 -0.77 1.00
0.55
0.55
0.55 0.55
0.55 0.55
0.55
0.55
0.33 0.33
0.33 0.33
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
-0.55
Filter Feature Map
17
Figure 1-18: Archieving maximun in the feature map patch
For computer, shrinking the size of the picture means saving the compute ability. It also makes it
less sensitive to the position because in the window, the max value position doesn’t matter, which
means the feature in the image can be moved a little bit.
1.5.3 Activation Function (ReLUs)
In CNNs, the activation functions is usually follow behind the convolution layer. The activation
functions are took as a units, which is right after every pixel value. So activation function units are
also called activation layers.
The activation function is introduced to increase the nonlinearity of the neural network model. In
the classification of samples, nonlinear models often have better classification capabilities than
linear models, and the results are more accurate. If the activation function is not used, no matter
how many layers of the neural network, the output is a linear combination of the inputs. If
activation function is used, the activation function introduces a nonlinear factor to the neuron, so
that the neural network can approach any nonlinear function, then the neural network can be
applied to many nonlinear models.
Rectified liner units (ReLUs) are widely used in CNN, which are the little computational units.(Fig
1-20).
Figure 1-19: Rectified liner units
0.11
0.11 0.11
0.11
0.11
0.11
0.11
0.11 -0.11 -0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11 -0.11
-0.11
-0.11
-0.11
-0.11
0.77 0.77
0.77 0.77
0.33
0.33
0.33
0.33
0.33 0.33 0.33 0.33 0.55
0.55
0.55
0.55
0.55
-0.33
-0.33 -0.33
-0.33
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.11
0.11
0.33
0.33
0.33
0.33
0.33
0.33
0.55
0.55
0.55
0.55
0.77
Maximum
18
What the ReLU units do is steps through everywhere there is a negative value and change it to
zero (Fig 1-21).
Figure 1-20: Feature map after ReLU unit
The feature map became a sparser matrix. Sparsity can remove the redundancy in the data and
retain the characteristics of the data as much as possible. Because the neural network is constantly
calculating repeatedly. In fact, it is trying to constantly find a matrix with a majority of 0 to try to
express the data characteristics. As a result, because of the sparsity, this method becomes faster
and more effective.
If the result of ReLUs is passed through the pooling layer. One will find the result of no activation
layer and the the result with it are the same. Then why the activation function after the
convolutional layer are necessary? If this is a 9000 by 9000 pixels picture, and 0 pixels means no
calculation is needed, then the time for the feature map that has passed the ReLUs function to pass
through the pooling layer will be greatly shortened, which improves efficiency.
1.5.4 Fully Connected Layer
After several previous convolution, activation and pooling layers, finally it comes to the output
layer (Fig 1-22). The model will learn a high-quality feature image fully connected layer.
Figure 1-21: Convolution activation to output structure diagram
0.11
0.11 0.11
0.11
0.11
0.11
0.11
0.11
0.77 0.77
0.77 0.77
0.33
0.33
0.33
0.33
0.33 0.33 0.33 0.33 0.55
0.55
0.55
0.55
0.55
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1.00
1.00
1.00
1.00
0.11
0.11 0.11
0.11
0.11
0.11
0.11
0.11 -0.11 -0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11
-0.11 -0.11
-0.11
-0.11
-0.11
-0.11
0.77 0.77
0.77 0.77
0.33
0.33
0.33
0.33
0.33 0.33 0.33 0.33 0.55
0.55
0.55
0.55
0.55
-0.33
-0.33 -0.33
-0.33
1.00
1.00
1.00
1.00
Convolution
Activition
Pooling
Convolution
Activition
Convolution
Activition
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
1
1
1
1
1
1
1
1
1
1
1
1
1
1.00
1.00
0.55
0.55
1.00 0.55
0.55 0.55
19
After the convolutional layer, pooling layer, and activation function layer extract the features of
the picture, the fully connected layer plays the role of a judge who decide the picture is X or O by
reading the final feature map. In other words, every value gets a vote on what the answer is going
to be. These much filtered and much reduced in size feature maps are rearranged and put into a
single list (Fig 1-23).
Figure 1-22: Every value gets a vote
Now each of those pixels connect to one of the answers that they’re going to vote. When we feed
this in a picture of X, there will be certain values here tend to be high. They tend to predict very
strongly this is going to be an X, which means they have a strong weight for the X outcome.
Similarly, if one feeds in a picture of O to this CNN, there are certain values here at the end that
tend to be very high and tend to predict strongly that it is going to have an O at the end (Fig 1-
24)(thicker line has higher weight).
Figure 1-23: Certain value has strong weight for the X outcome
1.00
1.00
0.55
0.55
1.00 0.55
0.55 0.55
1.00
1.00
0.55
0.55
1.00
0.55
0.55
0.55
1.00
1.00
0.55
0.55
1.00 0.55
0.55 0.55
1.00
1.00
0.55
0.55
1.00
0.55
0.55
0.55
20
For instance, a new input which is unknown and go through every layer of CNN, it get a series of
votes. And based on the weights that each value gets to vote with, it comes to a nice average vote
at the end. In this case, with the set of inputs vote for and X with a strength of 0.92 and an O with
the strength of 0.51. Definitely, X is the winner here. Thus, the neural network would categorize
this input as an X.
Moreover, the fully connected layers can also be stacked. The output of one vote can be the input
of next. At the far end, the array of pixels results in a set of final votes for a category.
After defining the neural network with initial weights and performing a forward pass to generate
the initial prediction, there is an loss function that defines the distance between the model and the
true prediction. Backpropagation is an algorithm that minimizes the loss function.
In this case, if the input is knew as an X, and because X has 0.92 votes, the error would be 0.08.
The votes for O is 0.51, so the error would be 0.51. When the two errors are added up, the total
error should be 0.59 (Fig. 1-25).
Figure 1-24: Result sample of fully connected layer
1.5.5 Back Propogation
In the neural network, there are some parameters called hyperparameters that need to be manually
adjusted, such as the number and size of filters in the convolutional layer, and the window size in
the pooling layer. The weight inside the convolutional neural network is automatically adjusted.
Through continuous training, these parameters will be continuously optimized. The method to
1.00
1.00
0.55
0.55
1.00
0.55
0.55
0.55
0.92
0.51
Right Answer Actual Answer Error
Total
X
O
1 0.92 0.08
0.51
0.59
0.51 0
21
adjust it is the back propagation method. The back propagation method is the basis of neural
networks. After calculating the error from the result, it adjusts the value of the weight in the reverse
direction to reduce the final error.
1.5.6 Summary
Convolutional neural network is an important tool for image recognition in artificial intelligence.
Its structure and convolutional layer calculation method lay the foundation for the recognition and
generation of complex images. This paragraph concretely describes the workflow of all layers in
the neural network through actual simple examples. These functions also appear inside the two
neural networks in GAN, and achieve the functions desired by the author by combining various
layers.
1.6 Generative Adversarial Network – Pix2pix & Pix2pixHD
Convolutional neural network is a common discriminative model. CNN has a picture recognition
function (Yamashita et al. 2018). GAN is a generative network, and its function is to learn existing
data to generate similar pictures or other functions (depending on the intention of the author of
GAN). But in the GAN model, it has more than one neural network. It is a machine learning model
that is coordinated by a generative network and a discriminative network.
This section will start from the structure and function of the original GAN and introduce the
working principles of the two neural networks in GAN. And introduce a GAN variant that this
article will use-pix2pix, and the internal neural network structure of its generator and monitor.
1.6.1 Generative Adversarial Network (GAN)
The principle of the generative model is to capture the data distribution from the real sample
(training set) and map it to a new data space, output the generated data and probability distribution,
and try to make it look the same as the samples in the training set. Simply put, this generative
model generates data similar to the training set by learning from the existing training set. This data
includes voice, picture and text recognition, etc. This model will use complex probability
calculations. But before the advent of GAN, the result of a single generative model data was not
ideal. There are many difficulties in probability calculation.
As mentioned above, there are two neural networks in the GAN model. The generative network is
called the generator (G), and the discriminative network is called the discriminator (D). The
relationship between G and D is like teacher and student; the G is like a student, and the D is like
22
a teacher. The assignments that students turn in without any guidance cannot get high grade from
the teacher. Students need to continuously improve the quality of their homework based on the
teacher’s evaluation. And teachers also need to continuously improve the requirements of students.
This back and forth process will eventually allow students to submit the best quality of homework.
The improvement of the quality of students’ homework and the level of teacher’s requirements
requires training in the training set(Fig 1-26).
Figure 1-25: GAN trainning process
Since the emergence of GAN, many variants in different fields have been proposed. The number
of GAN variants is increasing every year including cGAN, DCGAN, Pix2pix, and CycleGAN (Fig
1-27). They have been improved in structure, developed in theory, or innovated in the application
(Guo et al., 2016).
23
Figure 1-26: Image-generation type GAN
1.6.2 Pix2pix
In the original GAN, the internal structure of G and D, and the image quality has not been
significantly improved. One difference that needs to be noted is that GAN is used to generate data,
and pix2pix is used to convert pictures. In the GAN model, the input of G is a random vector,
which is used to trigger the generation process of G. This random vector is not much related to the
data output by G. The final influence on the output depends on the weight of neurons in G. But the
function that Pix2pix needs to realize is picture conversion, which means that part of the
information in the input picture can be shared with the output result. For example, the place where
the pixel color changes greatly (the boundary or high-frequency area) in the picture, the simple
color conversion, etc.
For example, pix2pix realizes the conversion of a picture from day to night (Fig1-28). In these two
pictures, the contours and positions of all objects have not changed, and the color is the key to this
conversion task. This means that the pixel positions of these contours can be shared. Depending
on the type of picture conversion, the information shared by its input and output classes will be
different.
24
Figure 1-27: Example of image translation
The structure used by pix2pix G is called U-net. The structure of U-net is derived by adding skip-
Connection to the common encoder-decoder structure in generative networks. Encoder-decoder is
a common generative network structure. The encoder is to read the features of the training set data,
and the decoder is to generate data based on the features. The main part encoder of this structure
is similar to the feature map extracted in CNN (Fig.1-29). At the same time, part of the information
in the encoder and docoder is shared through skip connection (Isola et al., 2018).
Figure 1-28: Encoder-decoder & U-net
Figure 1-29: U-net detail
The D in pix2pix is individually called PatchGAN. The PatchGAN discriminator tries to classify
whether the patch in each image is real or fake. The discriminator is convolved on the entire image.
See appendix for specific structure (Isola et al., 2018).
25
Pix2pix is to realize the conversion between pictures. If you give a trained pix2pix a picture, it can
convert it to another picture you want. Therefore, the training set of pix2pix must consist of a large
number of pairs of pictures. One is the input picture, and the other is the output.
1.7 Summary
This chapter was mainly composed of three parts: campus indoor layout design, building
information modeling (BIM) and machine learning.
Through the study of understanding the layout of campus rooms, it is proved that different types
of room layouts have different diversity. The design diversity of classrooms is the highest, and
the diversity of offices and bathrooms is low due to design code.The restroom has the lowest
diversity because of the code. The higher the diversity, the higher the learning cost for the machine
learning model and the more difficult it is to implement.
As an information integration system in the AEC industry, BIM software has strong information
sharing and processing capabilities. This feature can help machine learning to obtain the required
training set. Through the building information modeling software Revit and its built-in Dynamo,
one can extract the parameters we need from the 3D information model of the building. Dynamo
can be used to extract training set images that can be learned by pix2pix.
Pix2pix, as a GAN-based image conversion machine learning model, uses internal generative
networks and discriminative networks to fight against each other to optimize output results. It uses
a large number of paired pictures as a training set to obtain the ability to convert two pictures, and
can convert pictures outside the training set. In this case, by learning the pictures exported by
Dynamo, pix2pix can automatically generate the indoor layout by giving the specified room shape
and the position of the entrance and windows.
26
2.BACKGROUND RESEARCH
The combination of machine learning and construction is still in the trial stage. Since the
development of generative network itself is not smooth, most of the machine learning in the field
of construction stays in the prediction and classification problems. For example, building energy
consumption forecast. With the advent of GAN, the quality of the results generated by the
generative network has been continuously improved. This allows machine learning to play a role
in the field of construction.
This chapter will first describe artificial intelligence as a new method and its relevance to some
previous methods and technologies in architecture field. Next, it introduces the actual tools and
research of BIM software in data processing. Then introduce the research on automatic layout,
including non-machine learning methods and machine learning methods.
2.1 AI & Architecture
As a new type of technology, artificial intelligence did not suddenly break into the field of
architecture. It has continuity with the previous modular design, calculation design and
parameterization. Each period penetrates each other and learns from precedents. This chapter will
show the interweaving and evolution of computing and architecture by exploring past research and
inventions.
2.1.1 Modularity
Modularity can be considered as the beginning of system building design (Chaillou, 2019).
Modularity helps simplify building design. Modularity can be traced back to the concept of
"Baukasten" (Big Construction Kit), which consists of about six basic modules of different sizes.
Combine according to the needs of residents.
Later, Le Corbusier proposed Modulor, a mathematical proportion inspired by the human body.
Try to improve the appearance and function of the building through these proportions. Corbusier
described it as "range of harmonious measurements to suit the human scale, universally applicable
to architecture and to mechanical things."(Cohen, 2014)
These theorists brought the early architects into the concept of modularity (Chaillou, 2019).
Modularity has the advantages of less trouble, lower cost, and higher predictability.
27
Modular thinking has influenced not only architecture, but urban areas, furniture design, and
specific room design. This was done before the advent of computer use in these fields, but has
continued since then in different ways.
2.1.2 Computational Design
The rapid development of computer technology has impacted architecture. The early 1980s marked
the systemic revival of rule-based architectural design. In fact, as early as the mid-1950s, some
engineering offices began to conduct basic analysis of the potential of computer design.
In 1959, Professor Patrick Hanratty released PRONTO, which was the first prototype of computer-
aided drawing software for designing engineering components. Soon after, Christopher Alexander,
a professor of architecture at the University of California, Berkeley, proposed a key principle of
computer design: the "object-oriented programming" paradigm. He theorized the reason and
method of using computers as part of architectural design.
In 1968-1970, with the release of Urban 2 and Urban 5, the team AMG created by Nicholas
Negroponte at MIT demonstrated the potential of CAD for space design.
Based on this momentum, architects and the entire industry are actively transforming these
inventions into a large number of innovations.
Architect Frank Gehry is one of the most advocate sof this cause. He believes that the application
of computing can greatly relax the boundaries of the system and give new forms to buildings.
Gehry Technologies, founded by Frank Gehry, has created an opportunity for computer-aided
design in the next three decades, demonstrating the value of CAD to architects.
However, the shortcomings eventually appeared. In particular, the repetitiveness of certain tasks
and the lack of control over complex geometric shapes have become serious obstacles. Faced with
these limitations, a new paradigm has emerged outside of CAD: parameterization.
2.1.3 Parametricism
Parametricism is defined as a style in visual sense of the word and also a process-based architecture
(Schumacher, 2012). The biggest advantage of parametricism is that it can create complex building
shapes while avoiding a lot of heavy and repetitive work. This kind of calculation can be written
into the program and let it run automatically (Chailou, 2020). In the 1960s, architect Luigi Moretti's
project "Stadium N" defined 19 parameters, including the audience's field of vision and the sun's
28
exposure to the stands. By establishing a program between 19 parameters and the final shape,
setting the values and ranges of the 19 parameters, the shape of Stadium N is finally generated
(Bucci & Mulazzani, 2002).
In 1988, the first software Pro/Engineer that gave users full access to geometric parameters came
out. Summarize parametricism in the original words of the Pro/Engineer founder Samuel Geisberg.
"The goal is to create a system that would be flexible enough to encourage the engineer to
easily consider a variety of designs. And the cost of making design changes ought to be as
close to zero as possible."
With the emergence of Grasshopper and Dynamo, various BIM software, parametric design has
promoted the rationality and feasibility of the construction industry to a whole new level. Still,
construction field needs a tool that can deal with more parameters and handle more complex
problems. The emergence of neural networks presents new opportunities.
2.1.4 Artificial Intelligence
AI is essentially a method of statistics and probability. It responds to the limitations of
parameterized architecture. As introduced in Chapter 1, neural network is an artificial intelligence
algorithm. As the computing power of computers becomes stronger, the neural network becomes
more complex with processing ability. Also getting stronger. It has the ability to learn, and can
grasp some correlations between parameter changes and result changes through different layers.
This kind of association cannot be grasped by ordinary single parameters.
Although AI represents the potential of architecture to be very promising, it still depends on the
designer's ability to convey intent to the machine. Moreover, because the machine must be trained
to make it a reliable "assistant." Challenges at this stage are obtaining a suitable and large number
of training sets and finding suitable artificial intelligence algorithms and tools.
2.1.5 Summary
This paragraph briefly explains some important innovations and examples in the development
process of each stage and analyzes the bottleneck of each stage and how the end is broken by the
next stage. From the initial hope to reduce the complexity of the building through modularization,
then with the development of the computer field, architects attempted to control the building and
the city through simple known parameters, to finally explore unknown parameters through
29
artificial intelligence. There are new opportunities for design exploration, and the potential of
artificial intelligence is worth looking forward to.
2.2 BIM Customization — IMAGINIT Clarity
BIM contains massive amounts of data, especially for architectural design companies. Without
proper management, the management problems caused by the clutter of data will be catastrophic.
The BIM model itself has greater flexibility in information processing. With the development of
construction projects, BIM data is accumulating over time, showing multiple growth. How to store,
manage, and make good use of these architectural digital information is a problem that many
building design companies are facing.
At present, an efficient and common method is to introduce a BIM-based data management
platform as the front desk of BIM data management, and access the database of BIM software
through the API (Application Programming Interface) interface of BIM software to realize the call
and modification of building information.
In this case, DLR group is using a BIM-based data management platform called IMAGINiT Clarity.
The IMAGINiT Clarity product suite optimizes the efficiency of the BIM team by automating
manual work.All building projects data will be automatically extract to this platform (Fig.2-1).
With high level of visualization, users can easily filter and select the information they need.
Figure 2-1: IMAGINiT
For example, users can use the function of Project Query to export the data of a specific room from
a specific project (Fig.2-2& 2-3).
30
Figure 2-2: Project Query interface
Figure 2-3: Project selection
Due to the large dimensions and volume of construction project data, users can choose to export
only part of the required field data to speed up system reading (Fig 2-10).
Figure 2-4: Selection interface
All extracted data will be presented in the form of spreadsheet (Fig 2-11).
31
Figure 2-5: The spreadsheet of extracted data
In the background of IMAGINIT Clarity, the system actually uses SQL, a database query language,
to query all the data that meets the requirements in the BIM database according to the selection of
the front-end operating user (Fig 2-6).
Figure 2-6: SQLQuery
Meanwhile, the function of IMAGINIT Clarity is not only unidirectionally extract info from BIM
database, it also allows user to modify the relative information through the front-end interface and
then realize the data synchronization to BIM database (Fig.2-7 & 2-8).
32
Figure 2-7: Data sheet of one room
Figure 2-8: Edit room’s data
2.3 Automated Room Layout By GANs
Facility layout (spatial planning) is a task that includes complex decision-making activities that
apply expertise to decision-making, planning and creativity (MGI, 2017).
Automation can improve cost savings, reliability and productivity by systematizing repetitive tasks.
At the same time, it allows humans to focus on more high-value and complex tasks. (Anderson et
al., 2018). In recent years, the development of artificial intelligence has opened up a new situation
for the automation of facility layout. The automatic layout of the traditional building field requires
the establishment of algorithms, and all the detailed instructions are told to the computer through
the programming language. The emergence of artificial intelligence methods has essentially
changed the solutions to this type of problem. On the surface, they all solve problems through
algorithms. The difference is that the traditional automated analysis and discovery of the rules of
33
facility layout are done through manual research, while artificial intelligence uses computers to
discover the rules from data.
2.3.1 pix2pixHD for Apartment
Pix2pixHD is an upgraded version of pix2pix. In the case of similar internal structures, pix2pix
can generate a large number of pixels, which means higher quality pictures, but it also means that
more powerful experimental equipment is required. Pix2pix was first used in the generation of
indoor images. In this experiment, the training set is the indoor image and the shape of the indoor
image. The purpose is to input a specified black shape, pix2pix can output the layout of the same
shape room (Fig.2-9).
Figure 2-9: Pix2pix translate the image of room shape to room floor plan
There are two training sets in the experiment, one is a Japanese apartment floor plan with 1279
pictures, and the other is a Chinese apartment floor plan with 112 pictures. The two training sets
will be trained separately. In other words, the final result will have two trained pix2pixHD.
The first step is to create a paired training set. Since the purpose of the experiment is to generate a
plan from the boundary of the apartment plan, it is necessary to create their shadow map based on
the original data set to complete the production of the training set. In the experiment, the original
pictures were obtained from the Internet through crawlers (a tool for obtaining the information
needed by web pages in batches). The format of the pictures is png. In order to make their sizes
consistent, each picture is resized. At the same time, all pixels in the picture are blackened. In this
way, the conversion process has produced two pictures for the input and output of pix2pixHD
(Fig.2-10).
34
Figure 2-10: Training set processing
It should be mentioned that the floor plan of the Japanese apartment contains some pictures, which
will cause the final output to contain some unrecognizable text. Ideally, these words should also
be removed.
Pix2pixHD, as an open source shared code (fixed neural network structure and GAN structure),
most of the content has been set up. It can be regarded as a tool that can be used directly, but some
of the parameters need to be adjusted manually (hyper-parameter). In the experiment, after 50
epochs, the default learning rate can no longer be output by pix2pixHD The effect is continuously
optimized and needs to be adjusted manually.
The training equipment is Nvidia TitanX GPU. The Japanese apartment took 33 hours, while the
Chinese apartment used 2.7 hours.
In the final result, the test results of Japanese apartments are very unstable. On the one hand, the
training pictures contain unnecessary text. On the other hand, the styles of the apartment pictures
are not completely uniform. Some indoor pictures use different colors to mark different spaces,
but this The color of the label does not apply to all the images in the training set (Fig.2-11).
Figure 2-11: Training process result from different epoch
35
The image quality in the test results of Chinese apartments is very low, but the location of the
interior furniture is mostly reasonable. The low quality of the output pictures may be due to the
fact that the number of training sets is too small compared to the pix2pix of the Japanese apartment.
The reasonable interior furniture and spatial layout prove that the layout of Chinese apartment
design is relatively uniform or the number of training sets is too small and the diversity is too small.
Figure 2-12: Chinese apartment
It can be seen that the diversity, quantity, and uniformity of the training set have an impact on the
final result. Fuzzy output results are normal. For some things that need to be accurately positioned,
they should have a large color difference from the background. Only then will there be a clear
outline.
The training process may not only occur once, and each time the user may need to repeatedly
adjust the hyper-parameter based on the results to obtain the best output results.
At the same time, training requires better equipment, and ordinary equipment cannot support the
training of image machine learning models.
2.3.2 ArchiGAN
ArchiGAN is a tool for generating floor plans of single-family houses (Chaillou, 2019). Its essence
is to complete a more reasonable automatic design by connecting three pix2pix to each other. The
three processes are contour generation, filling color (room) and furniture layout.
36
Figure 2-13: Stacked Pix2pix model
First, learn the outline of the house through the data of Boston city, and automatically generate a
reasonable shape in a given specific area(Fig.2-14).
Figure 2-14: Specific-area to floor-shape translation
Secondly, according to the given shape, fill different color blocks inside, that is, different types of
rooms.
Figure 2-15: Part of training result from Model II
37
Finally, the color patch map is converted into an indoor floor plan (Fig.2-16).
Figure 2-16: Training results from Model III
This article gave a lot of new ideas. For pix2pix, this model does not really understand that this
room type like humans during the training process, and what is its function. The color or RGB
code is used to replace the room type, so as to establish the association between the color (room
category) and the facility in the third step. This method can also be applied to other applications,
and the facilities in this article will be replaced with different color patches. In the article, the
author believes that ArchiGAN has three limitations:
1.When creating a multi-story building, the model cannot understand the concept of a load-bearing
wall. Therefore, the rationality of the design of the entire building cannot be guaranteed.
2.Secondly, the resolution is too low. This paper was published after the previous hao zheng, but
the technology used is indeed pix2pix, not pix2pixHD. Of course, this includes the factor that
pix2pixHD needs to consume more computing power.
3.In terms of the format of the generated data, the current output files generated by the intelligent
are in non-vector format, which is equivalent to only staying in the conceptual design (draft stage)
and cannot be directly converted to CAD for use.
2.3.3 Summary
From the previous pix2pix application, we can understand the limitations of their technical
application. The results generated by Pix2pix usually do not have high definition. The conversion
result usually also depends on the device. The results are usually not processable. But at the same
38
time, some novel methods have been proposed to define the space of a picture through color;
continuous nesting can allow artificial intelligence to make logical layered judgments. Use these
inspirations and apply them in this article.
2.4 Summary
The encounter between architecture and artificial intelligence did not appear suddenly. Architects
have attempted to standardize modular building design in the past century, and integrate computers
and architecture. Then to control parameters, parameter design is realized. Then to artificial
intelligence, trying to find more correlations between parameters. This development process is
regular. The widespread use of BIM provides better conditions for machine learning data. With
the help of BIM software, pix2pix training set export becomes possible. Pix2pix is based on
previous research. There are some methods can be used to solve spatial or facilities annotation and
improve the quality of output results.
39
3. METHODOLOGY
This chapter is is divided into two parts. The first part is based on the assumption that the trial run
of the self-built Revit model would work in practice, which ignores the massive amount of real
data and the complexity of the model. The second part is an improved method after trying the
original data. There is a significant discrepancy between the actual operation of the Dynamo code
and its intended operation.
3.1 Theoretical methodology
This methodology provides a method for data extraction and cleaning of machine learning models
under almost ideal conditions. It assumes that the Dynamo code will work in a reasonable amount
of time with a large data set of Revit models. In order to get access to the Revit models, a person
at the architecture firm supplying the data from the Revit files was required to run the Dynamo
code, not the author of the code. This presented several problems in practice. The original
methodology ignored the complexity of the models and the number of models. In actual operation,
time cost as an essential factor ultimately negated the overall usefulness of the theoretical
methodology. However, it is useful to understand it as a basis of the actual methodology used (Fig.
3-1).
Figure 3-1: Theoretical methodology
The theoretical methodology is divided into five parts with multiple sub-parts:
1. Export the pictures needed (office spaces with furniture) from the Revit data for training
through Dynamo.
2. Organize pictures in batches through Photoshop to meet the requirements of the pix2pixHD
training set.
3. Training Model pix2pixHD I.
40
4. Training pix2pixHD IIModel 2.
5. Output the prediction result of Model pix2pixHD1 to Revit through Dynamo to create Revit
3D model.
Three sets of pictures should be exported (Fig. 3-2).
Figure 3-1: Exported images
To understand why the three sets of images have to be exported, one must understand the functions
that pix2pixHD needs to implement for this application. The functions that Model 1 & 2pix2pixHD
I and pix2pixHD II models need to implement are
1. Predict the type and location of facilities in the room based on the shape of the designated
room and the location of the entrance and windows, and
2. Generate office floor plans with clear image quality based on the results of pix2pixHD
Model I.
To achieve these two functions through training, it is necessary to export two sets of training sets
(training set 1 and training set 2) through Dynamo. Training set one contains groups 1 and 2.
Training set two contains groups 2 and 3.
• Group 1 includes the shape of the room, the position of the entrance (door), and the window.
• Group 2 includes the content of the first set of pictures plus the type and location of the
facilities.
• Group 3 are the original floor plans.
3.1.1 Extract Training Set Images through Dynamo (1)
The Dynamo code has several important sections (Fig. 3-3).
41
Figure 2-3: Full Dynamo code
To obtain images of a large number of offices from Revit, it is necessary to create views of each
office in batches through Dynamo. But the first step is not to make a view, but to hide various
elements that are not needed in the view in advance, such as multiple lines and reference grids (Fig.
3-4).
Figure 3-4: Partial Dynamo code
Room is used as a category in Revit. The Dynamo can extract all the elements in a category by
node All Elements of Category (Fig. 5).
42
Figure 3-5: Extracting elements by category in Dynamo
After selecting the rooms the offices need to be filtered out. In the Revit data (because it came
from many files and users were not consistent), all rooms are named, but the name's format is
different, the capitalization is not uniform, and some use abbreviations. For example, Off for office.
The code is written to ignore the case. Node String.Contains can achieve this function (select
Lacing-Cross products). The structure of the output list of String.Contains is related to the number
of search keywords and the structure of the searched list. For the string list with a simple single-
layer structure, each string will eventually become a subList from String.Contains. This sublist
contains bool functions, where True and False mean whether the string contains keywords. As
long as this sublist has one true, it means that the space represented by the room name needs to be
selected (Fig. 3-6).
Figure 3-6: Subset of code to find offices
After this, directly connect the node Boolean.AnyTrue function to convert the sublist into an
element. And filter out the required rooms through node List.FilterByBoolMask (Fig. 3-7).
43
Figure 3-7: Subset of Dynamo code to finish selecting a room
After obtaining the desired room, the next step is creating a view centered on this room. Create
Floor Plan View through node FloorPlanViews.ByRoom. Obtain the boundaries of these rooms at
the same time, and convert these virtual room boundaries into Curves. Finally, only the view in
the curve is displayed through view.SetCropBoxByCurves (Fig. 3-8). The view of the room is
obtained (Fig. 9)
Figure 3-8: Dynamo code to crop the view
44
Figure 3-9: Current output of code
After obtaining the view of the room, the next step is to set the color of the view to meet the
requirements of the training set. In the training set, in order for the pix2pix model to predict what
facilities should be in the room and the location of the facilities, one set of pictures has to provide
the shape of the room and the location of the entrance and windows, and the other set of pictures
has to provide all the information in the previous set and add the type and location of facilities. To
accomplish this, abstract color block images of the rooms are needed (Fig. 3-10). Being abstract
improves the training speed later in the process.
45
Figure 3-10: sample color block images
In this step, sometimes, the curve cannot be closed because there are columns in the room. So
before selecting the room boundary, it is necessary to modify the room bounding of all columns.
Room bounding is a kind of parameter, which can be changed in batches through node
Element.SetParameterByName (Fig. 3-11). However, room bounding and other Dynamo codes
are in the same file, and it cannot be guaranteed that Room Bounding will run first (including Node
46
passthrough). Therefore, this part of the code has to be separated from the main part to another file
and run first.
Figure 3-11: Portion of Dynamo code for setting parameters
The visual style of the view can be changed in Revit, and the shade visual style will automatically
turn all walls and floors into gray in the view. The visual style of the view can be changed in
batches through Dynamo. This function can be achieved through node Visual Style Select. The
integer entered represents different visual styles, and the number 3 represents shade (Fig. 3-12).
Figure 3-12: Portion of code for setting visual style
Windows and doors have their own categories in Revit, so there is no special screening, even if
their model names are different, they will be selected without special screening (Fig. 3-13).
Figure 3-4
Due to the special shape of doors and windows, some doors and windows do not have apparent
cross-sections. Therefore, in order to achieve versatility, it is necessary to create solid blocks and
47
set colors. T through node element. Boundingbox is a virtual box to pack the element. After that,
node Boundingboxtocuboid converts the box boundingbox into an entity. Finally, it becomes an
entity in the Revit view through Directshape.byGeometry (Fig. 3-14). By giving this entity
material and defining the category that can be displayed in the view, the color in the view is realized.
This ensures that the doors and windows can be turned into rectangles and given a color (Fig. 3-
15).
Figure 3-14: Code for setting color for windows and curtain walls
At the same time, to create Group 1, all furniture needs to be hidden (Figure 3-15).
48
Figure 3-15: Hide elements in the views
Figure 3-16: Sample image of doors and windows converted to colored rectangles
Due to the followingvariety of reasons, the names of the furniture are not consistent.
1. Furniture is named by people. Finding a chair is difficult because “chair” can be the only
keyword.
2. Not all rooms have furniture models
3. In one project model, the office is repetitive. So many projects only model the furniture layout
in a few offices, but not every office with the same layout.
This database must be be filtered by keywords like the room. And to reduce program crashes or
excessive running time, it is necessary to mitigate meaningless calculations.
This section consists of two parts, one is filtering, and the other is coloring the selected family.
The steps for the screening of the Revit files are similar to those for room screening, both using
keyword search, but the complexity of the name and the number of facilities are much more
complicated than the room. In order to understand the characteristics of facility names, Excel tables
of all rooms and furniture were exported in advance (Fig. 3-17).
49
Figure 3-17: Part of Excel database showing names of furniture in rooms
Through the screening of the Excel table, one can know the usual names of tables and chairs.
According to the complexity of the names, the screening process of this part may change.
The node select.byCategoryAndView is used to select only all the furniture in the room that is
selected. Through the above screening, one can select the furniture is needed, such as tables and
chairs (Fig. 3-18).
Figure 3-18: Part of code for filtering the furniture
There are two ways to color the furniture. The first is the same as the method for windows and
doors, create entities in Revit and define colored materials for the entities. However, since many
entities may overlap and cause Revit to report errors and program crashes, a different method was
tried. The second method is to color using the override command.The only similar node Dynamo
have is Overrides.inView. However, it only works in the active view., this function is implemented
through a custom Python node (Fig. 3-19).
50
Figure 3-19: Python script to override the color of an object
In the Python code, the first part is the import instruction library, the second part defines the input,
and the third part is the operation instruction for the input. SetSurfaceForeGroundPatternid and
SurfaceForegroundPatternColor respectively modify the type and color of fForegrounriybd
pPattern that set the selected target. Since this step's article's main purpose is to color, the default
pPattern is the most common Solid is set by override.SetSurfaceForegroundPatternID(Pat[0],Id).
The solid pattern’id is 0 (the first one in the pattern list) in Revit. Color (255, 0, 0) represents red.
This is the interface set by Override in Revit, showing the results of Python running (Fig. 3-20).
51
Figure 3-20: Result of Python code in Revit
To summary, by filtering the view, setting the view's visual style, creating Directshape and
Override Color, etc., it is possible to color the facilities in the view (Fig. 3-21). The purpose of the
color blocks is to save machine learning time and to improve the efficiency of training.
Figure 3-21: Doors, windows, and furniture set to color blocks
Text about Figure 21 When the required view is created, the next step is to export image by view
in the archilab Archilab package (Fig. 3-22). The parameters that can be set in this node include
pixel size and image resolution. But after the most test and comparison, there are only two of the
parameters that affect the quality of the final generated image, namely _zoomFitType and
_pixelSize. In addition to the quality of the image, the more important point is to keep each view
set to the same scale. This ensures that the machine learning model can recognize the size of the
room. Large rooms can have more furniture, while small rooms are relatively reduced.
Figure 3-22: Node export image by View in ArchiLab package
The final export is a color image file (Fig. 3-23)
52
Figure 3-23: Result of Dynamo code
3.1.2 Data Cleaning (2)
After finishing the picture filtering, for the pix2pixHD, all input pictures' sizes and the number of
pixels in the image must be consistent. However, the ratio of the image exported in the previous
step is the same as the shape of the room. Assuming that the number of pixels set by the final
machine learning is 200*200, all pictures need to be reduced to the same extent and placed in a
256*256 white background, and there are white edges around the graphics (Fig. 3-24). The number
of completely exported pictures is large, and batch operations need to be completed with the help
of the software Photoshop. In Photoshop, one can perform the previously recorded operations on
all files in the same folder by recording the steps to modify the picture. The cleaning and
completion of all training sets can be completed in batches.
Figure 3-24: Original and final images set to same standard size and aspect ratio.
3.1.3 Pix2pixHD Training (3)(4)
As an open-source model, Pix2pixHD is improved by Nvidia on the basis of pix2pix, which can
learn higher-resolution training sets and generate higher-resolution results, and the computing
power of the computer is higher. Pix2pixHD has a minimum requirement of 12GB for the video
memory of the graphics card. Pix2pixHD is published on Github. It can be applied directly by
downloading from https://github.com/NVIDIA/pix2pixHD pix2pixHD .
53
pix2pixHD can be run on a leased platform ,; in this case Paperspace was used, the leased platform
is a Linux system, and the graphics card uses Nvidia Quadro P6000. All operating instructions are
completed through Shell language (Fig. 3-25)
Figure 3-25: Paperspace platform startup interface
Paperspace's platform can be visualized and is more user-friendly than other rental platforms. The
rented platform can be considered a new computer (Fig. 3-26). This platform mainly provides
machine learning operations, it will advance all the libraries that machine learning will be applied
to. Shell is a command language; it is a kind of operating instructions that can replace the mouse
and keyboard in the Linux system, including opens all files, downloads content from the specified
connection, etc., by inputting instructions. In the blank Linux system, create a new folder named
pix2pixHD through the command mkdir pix2pixHD. Then open the folder through cd pix2pixHD,
through the git clone command, copy all the content from the pix2pixHD posted on GitHub by
Nvidia and download it to the pix2pixHD folder.
Figure 3-26: Rent Service Desktop
54
The pix2pixHD released by Nvidia contains its own training set, which can be deleted. A new
training set was added. The training set needs to be uploaded to the server via Google drive and
downloaded via the rental platform's browser. In the official instructions, the training set must be
named train_A and train_B. In addition, every paired image should have the same file name.
For the manual processing of the training set, some model parameters need to be manually set in
advance. First, no instance map are needed, so reading instance maps should be turned off. Second,
pix2pixHD has to read each pixel's RGB color, so label _nc to 0 is set according to the official
guidelines (Fig. 26). Other parameters can be set, and each parameter command and comment used
is listed in Appendix A.
Figure 3-27: The hyper-parameters setting in Shell language
The shell command for hyperparameter setting is the same as the command python Python train.py
for running training.
During the training process, the generator and discriminator loss values of each epoch are recorded
overall. The loss values of G and D will not converge. The two neural networks are fighting. When
the loss value of one party gradually decreases, the other party's loss value should increase
progressively. Ideally, as the training epoch continues to increase, the generator's loss value should
continue to decrease and eventually stabilize. In contrast, D's loss value continues to grow until it
is impossible to distinguish whether the content generated by G is true or false.
During the training process, once the complete training set is not run, it is called 1 epoch. In
addition to recording the loss values of G and D, pix2pixHD will also sample a random training
set to show the results generated by G (Fig.3-28).
55
Figure 3-28: Each group from left to right: Group 1(input), Group 2(output answer), output from
the generator
After the training is completed, the model can be tested. According to the previous hyperparameter
settings, the code of G will be saved every 5 epochs (default is 10). All of them can be tested by
importing the test set for testing (Fig.3-29). The results of the test will be presented and analyzed
in Chapter 5.
Figure 3-29: G&D trained model in everyeach 10 epoch
3.1.4 Remap the facilities by Dynamo (5)
Dynamo still has great difficulty in reading pictures, and it needs to convert all images. Through
the conversion of the information, the predicted furniture category and location can be obtained
(Fig. 3-30).
56
Figure 3-30: Methodology this step is difficult to do
3.2 Actual Methodology Used
Section 3.1 described the theoretical methodology. This section explains the methodology that was
actually developed as the research progressed (Fig. 3-31).
Figure 3-31: Methodology used
In the practical application of the theoreticalWhen trying a test of the first methodology, the export
time was discovered to be much, much too long, especially since the Revit database could only be
accessed by someone in the architecture firm, and not by the researcher directly. The complexity
of the model also makes made the quality level of the final generated image uneven. So in a limited
timeGiven a limited amount of time, another methodology was developed, based on several
assumptions, and the proportion of Dynamo used in the training set export process was lessened.
The layout of doors, windows, and facilities in the training pictures are created in Photoshop by
manual operation and batch operations. The training part of Pix2pixHD is consistent with the
original methodology.
After many tests, Dynamo still couldn't reduce the export time, so it finally decided to delete all
the parts that deal with doors, windows, and furniture based on the original Dynamo code. Only
57
the superficial part of the floor plan exported is retained. The other two groups of exported pictures
are all manually changed through Photoshop based on the floor plan.
The actual methodology use is divided into five parts. Compared with the hypothesis, the training
part has not changed. The first and second parts have undergone significant changes, and
Pphotoshop occupies the main part of the training set production.
Compared with the theoretical methodology, the Dynamo of the actual methodology is simpler. A
large number of parts are deleted on the original basis, and only the preprocessing part and the
export step of the office floor plan are retained. All other operations are replaced by PS manual
operations and automatic batch operations (Fig. 3-32).
58
Figure 3-33: Detailed Difference difference in Theoretical theoretical Methodology methodology
(above) and Actual actual Methodologymethodology (below)
3.3 Summary
The process of Dynamo code is theoretically feasible, and customized Node nodes makes it
feasible without a ready-made node. The first set of problems encountered had to do with the
slowness of Dynamo in extracting the information from the Revit database., and more importantly,
the inconsistency of the original data (e.g. furniture names). This should have been expected, but
59
was not. Dynamo coding that is feasible on one Revit file does not mean that it can run perfectly
on all projects. The issue was the inconsistency made Dynamo code cannot be run without manual
operation. Usually, a completed Revit project with furniture are huge file. Running Dynamo on
those takes hours per project. In addition, A project has many offices, but the layout is repeated.
This leads to a situation where a lot of time is spent on a project but only a few offices with different
layouts are obtained. This reflects from the side that the complexity of different Revit models is
completely different. In different projects, there are different elements that need to be hidden when
creating views, and the furniture for model applications are not exactly the same. To design a code
that can run perfectly on different projects requires constant trial and debugging, which creates
considerable difficulties especially for researcher without direct access to data. At the same time,
it also lays the groundwork for the situation where it takes a lot of time to run and cannot get the
desired result.
The new process was heavily manual, with much more emphasis on Photoshop and batch
processing than was originally intended. Additional training data was also created in Photoshop
with the use of rotating and mirroring of images obtained from the Revit database.
Step 5 of the original methodology “Output the prediction result of Model pix2pixHD1 to Revit
through Dynamo” the main frame is completed, but the details still need to be repairedwas not
even attempted (Fig. 3-34). It would have resulted in Revit models of the new generated office
spaces, but the lack of a consistent naming convention prevented its completion.
Figure 3-34: Actual methodology used
60
Makes the whole process more flexible. It's like building blocks in the shape you want when you
stack wood. But dynamo that is feasible on one project does not mean that it can run perfectly on
all projects. This reflects from the side that the complexity of different Revit models is completely
different. In different projects, there are different elements that need to be hidden when creating
views, and the facilities for model applications are not exactly the same. To design a code that can
run perfectly on different projects requires constant trial and debugging, which creates
considerable difficulties for research without access to data. At the same time, it also lays the
groundwork for the situation where it takes a lot of time to run and cannot get the desired result.
Compared with the training set generation, the training process is relatively stable. Better
equipment can lead to faster training speed and quality. The quality of the results generated during
testing is also high. The training set is the key to machine learning. Chapter 4 will discuss all the
details of the actual methodology, including the training set acquisition and the detailed display of
the training process (step1-step4). And Chapter 5 will analyze and compare the generated
results.Chapter 4 details the development of the training set and Steps 3 and 4.
61
4. DETAILED ACTUAL MOTHODOLOGY
Chapter 4 introduces the details of creating of training sets in the actual methodology and future
work sample, including Dynamo export pictures, Photoshop production, processing pictures and
placing office furniture in Revit 3D model (future work). Each part will have results displayed
(Fig. 4-1).
Figure 4-5: Actual methodology
4.1 Revit Data to Images
Compared with the derivation step of Dynamo in the theoretical, original methodology, the actual
methodology removes the steps of extracting and processing furniture, doors and windows. The
overall steps are relatively simple (Fig. 4-2).
Figure 4-6: Actual methodology step 1
There are two Dynamo files. The first part is mainly used to adjust the Room Bounding parameters
of columns and walls. The second part is used to generate and export separate room floor plans as
images (Fig. 4-3).
62
Figure 4-7: Dynamo code in 2 files
4.1.1 Part I :Parameter: Room Bounding
The reason for the separation is that Room Bounding must be set before the room view is generated,
but the existing methods cannot be used to ensure that room bounding runs first. Room Bounding
is a parameter of Revit, and it is a Boolean function with a yes or no existence. It can be viewed in
the Properties of Element in Revit (Fig. 4-4).
Figure 4-8: Property interface in Revit with Parameter Room Bounding
If the room bounding is not set in advance, the boundary of a column inside a room will cause the
view to be unable to be generated. Columns inside the rooms are very common in architecture. In
Revit, unclosed curves can be marked as rooms, but such rooms cannot be recognized by Dynamo
and generate views (Fig. 4-5). Because in the second part, the view is after the floor view created
in batches, the room view is cut out by reading the existing or set closed boundary line. So a closed
curve is a necessary condition for batch creation of room views.
63
Figure 4-9 : (Left to right) 1. Room in Revit 2. Boundary line of the chosen room in Dynamo
without operation of room bounding 3. Error from unclosed boundary for the view creation
So by changing the Room Bounding parameter of columns to False through the Dynamo node,
one can obtain all columns through All Elements of Category and change the value of Room
Bounding from Element.SetParameterByName by entering the String of "Room Bounding." To
change the value of Room Bounding, one enters False through the node Boolean (Fig. 4-6).
Figure 4-10 : Whole nodes for changing room bounding of columns
4.1.2 Part II :Export Views
In the second part for exporting views, four steps are included (Fig. 4-7):
1. Creating room views
2. Hiding auxiliary lines
3. Setting the visual style of the view
4. Export views
64
Figure 4-11: Part II: Export views
4.1.2.1 Creating the Room Views
The first step is to create a view with the border of the room as the boundary (Fig. 4-8). First, one
gets all the rooms through the combination of categories and All Elements of Category. However,
only the offices have to be extracted. In the firms's original data, the offices are usually named
office. So screening the rooms can be done by the name of the room.
Figure 4-12: Dynamo code for creating the view
Element.Name can extract the name of the selected elements. The extracted names are output in
the form of a list. String.Contains is followed by node Element.Name. String.Contains requires
three inputs: the list to be tested, the keyword to be retrieved, and the Boolean function whether to
ignore case. According to the Excel file that the firm previously exported, some offices are named
OFF with the abbreviation. Therefore, "off" is good to go, while ignoring its case. In order to
increase the versatility of this step, these nodes will support multiple keyword searches.According
to input is a list or an element, the structure of the list output by String.Contains is different. A
keyword input matches a two-layer structure list output, and a list (one or more keywords) matches
a three-layer structure list (Fig. 9). If one takes twelve rooms as an example, only two of them are
65
named office, and the others are named room. When a keyword (single element, not list) is input,
the output list is two levels. When the input is a list with two keywords, the output list structure is
three levels, and there are two boolean values under each level 2. In the three-level list, the second
layer represents the room, and the boolean value of the first level, which is the bottom level,
represents whether the keyword is contained in the room name, the number of keywords and the
number of elements under the second level is equal. Trying to match the levels of the lists is a
common issue in Dynamo.
Figure 4-13: (Left to right) one keyword, one-keyword list, two-keyword list
Because "off" and the name that may be added later are in OR relationship, as long as there is a
true at the level 1, it means that the room represented by this list level 2 contains one of the
keywords. So through node Booleans.AnyTrue, multiple bool values at the bottom are turned into
one (Fig. 4-10).
Figure 4-14: Booleans.AnyTrue
In Dynamo, the order of each node's output and input list corresponding and will not be disturbed.
So when the first and second ones in the output list of Booleans.AnyTrue are true, it means that
66
the corresponding first and second rooms in the previous room list are offices. The room can be
filtered out through node List.FilterByBoolMast (Fig. 4-11).
Figure 4-15: The list order is kept
When the filtered rooms are obtained, using FloorPlanView.byRoom will create as many floor
plans views as the number of selected rooms. The code obtains the boundaries of the room through
Room Boundaries and intersects the created floorplan with the room boundary to obtain a view
that only displays the room.
4.1.2.2 Hiding auxiliary lines
To the end of the previous step, this is the output result (Fig. 4-12).
67
Figure 4-16: Result from step 1
After creating the view with the room shape as the boundary, the code had to hide all the auxiliary
lines in these views. These auxiliary lines must be hidden. This step is divided into two parts (Fig.
4-13).
Figure 4-17: Hide auxiliary lines
68
The purpose of two parts are the same, which is to hide the unwanted content in the room view.
The two parts use View.HideElements. View.HideElements has three necessary inputs: elements
that need to be hidden, in which view to perform the operation, and whether to run this command
(Fig. 4-14).
Figure 4-18: View.HideElements
The screening steps in the first part are very simple. One finds the category corresponding to the
line, selects all the elements, and enters the view and elements into View.HideElements (Fig. 4-
15).
Figure 4-19: Part I
The difference between the two parts is that all standard lines can be easily obtained by Revit
category. But one of the "lines" is a section line (Fig. 4-16). In a top plan view, the section becomes
a straight line. It is still a plane. But like other lines, it is also an element in the selected view,
which requires more complicated filtering steps than the first part.
69
Figure 4-20: The "line" is a section view line
Part II is for hiding this section line (Fig. 4-17).
Figure 4-21: Part II
In the second part, first, one obtains all the elements in the input view through
Collect.ElementsInView in the Spring package, and obtain the family type to which all elements
belong through Element.GetParameterValueByName. The Spring package is free add-on to
Dynamo that provides additional command nodes.
One obtains the view type of the Building Section through ViewFamilyType and then compares
all element's family and view types with the family type of Building Section through node
List.Equals. Since the input structure of Collect.ElementsInView is a three-level list, the level 2 is
the input room, and the bottom level is all the elements under each room. So to hide the view in
each room, the initial element list structure must be saved to the end (Fig. 4-18).
70
Figure 4-22: List structure
Therefore, the keep list structure must be manually set in the List.Equals. Finally, the code filters
out this section view through List.FilterByBoolMask and hides it (Fig. 4-19).
Figure 4-23: Keep list structure in List.Equals
The final result of the steps1 and 2 is a floor plan with the extra lines removed (Fig. 4-20).
Figure. 4-24: Result from steps 1 and 2
71
4.1.2.3 Setting the Visual Style of the View
This step is necessary to set the visual style of the created view. Node Visual Style Select needs to
input the set view; this input is an integer (Fig. 4-21). The integer number corresponds to the
following styles: 1. Wireframe, 2. Hidden line, 3. Shaded, 4. Shaded with Edges, 5. Consistent
Colors, and 6. Realistic.
Figure 4-25: Visual Style Select
At this point, the floor plan view of the room has been completed (Fig. 4-22).
Figure. 4-26: Result from steps 1 through 3
4.1.2.4 Exporting the Views
For the views that have completed steps 1 to 3, one only needs to export them through the Export
Image by View in the Archilab package (Fig. 4-23). The Archilab package is another optional
plug-in for Dynamo that supplies additional functionality. The node is to specify whether exporting
a single image or a whole set of images. Also, there are options for Image Resolution and Size, but
those can only be set when the Zoom Type is set to Fit To Page.
72
Figure 4-27: Step4 - views output
4.1.3 Actual Operation
In the actual operation process, some changes have taken place in the above steps. Part I of section
4.1.1 is not applied. Revit Stores the room boundaries as lists of lists when there are outlines of
things like columns. The script took the outside boundary of the room only.
For section 4.1.2.2, All line were took out manually, and a view template that a transferred to each
file was created because this part was being applied to each newly created view and hiding
elements individually rather than turning off a category. What section 4.1.2.2 shows is the hands
off approach which is ideal situation but would not run successfully.
4.1.3 Output Result
After running Dynamo on the actual data, 243 office floor plans were obtained (Fig. 4-24). After
one manually deleted images that had no furniture, 195 floor plans were left. Each image is
exported with the highest quality that Revit can output. Since the shape of the picture is the same
as the room, the resolution of each picture is different. And each image is about 4000*4000,
whichconvenient for subsequent processing.
73
Figure 4-28: Subset of 243 images (upper) and zoomed in view of three of them (lower)
74
4.2 Build Training Set by Photoshop
This section is based on the pictures exported by Dynamo to make the training sets (Fig. 4-25).
The whole process will done manually first, and then Photoshop is used to batch combination to
complete the training set production.
Figure 4-29: Actual methodology part 2
Based on the obtained picture, the first step is to convert it to 1024*1024 through the batch
operation. Step 2 is to manually mark the doors and windows and furniture in each Group 3 to
generate Group 1 and Group 2. Since the total number of training sets is not enough, step 3 is to
triple the number of all three group images through batch operation rotation and mirroring, then
make it into training set 1 and training set 2. This section will use the first picture exported by
Dynamo to explain the entire process (Fig. 4-26).
75
Figure 4-30: Entire process of part 2 in detail (top); lower two images are zoomed in versions of
the top complete image
4.2.1 Step 1: Batch Operation --- Resized
Pix2pixHD supports high-resolution image conversion, which is up to 2048 x 1024, but the high-
resolution image training image is very slow during the training test. Simultaneously, due to the
abstraction processing, the high-resolution derived results are not much different from the low-
resolution results. But the image conversion process can be converted from high pixels to low
pixels, but low pixels cannot be converted to high pixels. So in order to be able to change it later,
the first step is to unify the size of all room drawings to 1024*1024.
Steps 1 and 3 do not include operations that require manual identification (identification of
furniture, selection of colors, etc.). It is suitable for running scripts in batches. And Photoshop
76
contains tools that run in packs, and it can record the actions (operations). The window is a toolbar
attached around the operation interface during Photoshop work. There is a toolbar called Actions,
which can record each step of the operation (Fig. 4-27).
Figure. 4-31: Actions window in Photoshop
Before recording an action, a new action set needs to be created first, which is like a folder, and
the files in the folder are the recorded operations. Then one presses the plus button to create a new
record action. After clicking Record, it will start recording all the process until the user stops it
(Fig. 4-28).
Figure 4-32: Creating action set: Resized
The window interface after starting to record is very similar to the camera's recording interface:
pause, stop, and continue (Fig. 4-29). And each step of the operation will be recorded in the newly
created action "Resized."
77
Figure 4-33: Actions is recording
4.2.1.1 Step one
The quality of the exported pictures is very high (4000 x 4000), and each size is different. To have
the same proportions of all images in the end and meet the training set images requirements of
machine learning, another sequence was accomplished. The main operations in this step are the
following:
1. Zoom
2. Add a white 1024*1024 background
3.Export and rename.
First, open the picture through Photoshop and unlock the layer (Fig. 4-30).
Figure 4-34 The image layer is initially closed
Then open Free Transform with the shortcut key ctrl+T, set 10%, and press check to confirm (Fig.
4-31).
78
Figure 4-35 Image 1 shrink in the original size background
Open Canvas Size by shortcut key ctrl+alt+c, modify the size to 1024*1024 (Fig. 4-32).
Figure 4-36:Change size from Original size to 1024*1024
The one adds a white background. After creating a new layer and dragging it to the original layer,
one uses the shortcut key shift+F5 to open fill and set the foreground color to white (Fig. 4-33).
Operation save in this step needs to be recorded in action. The save format is a jpeg,
79
Figure 4-37 Option of Saving as JPG image
After completing this step, the recording of the actions ends. Each step of the operation is recorded
in the action window (Fig. 4-34).
Figure 4-38 Actions window after recording all operations
Then one selects batch under Automate in a file. In this step, one has to set the folder where all the
data need to be batch operated, the exported folder, and the exported file's file name. Select one
digit serial number to reprogram the original filename into a digital code convenient for a large
amount of data management.
80
Figure 4-39: Batch running
The result from this step will be a file with the same number of images from Dynamo, and their
name is the one-digit serial number (Fig. 4-36).
Figure 4-40 Results from step 1 (Group3)
81
4.2.2 Step 2: Color Images by Manual Operation
Due to Dynamo's long running time and the quality of the exported results is not right, this step is
completed by manual operation. First, open the picture through Photoshop (Fig. 4-37).
Figure 4-41: Open image by Photoshop
One selects the pen tool, shortcut key P, change path to shape (Fig. 4-38)
Figure 4-42: Change path to shape
One uses the pen tool to trace and close the shape to get a complete black shape (Fig. 4-39)
Figure 4-43: Draw the shape
Then one hides this layer, creates a new layer, and uses the same steps to color the new layer's
doors and windows. One uses the Pen tool (p) trace the window's location and clicks Fill to select
the color, Fill select purple (255, 0, 255), adjust the Stroke to 9px, which is the same size as the
window. Then one draws the door and chooses green (0,255,0) for the door (Fig.4-40).
82
Figure 4-44: Set color
TAto this point, one of the Group 1 images can be exported (Fig. 4-41)
Figure 4-45: Images of Group 1
The one repeats the above steps to color the furniture (Fig. 4-42) and unhides all the layers to
export Group 2 (Fig. 4-42).
Figure 4-46: Color the furniture
83
4.2.3 Step 3: Batch Operation --- Mirrored and Rotated
In step 3, to increase the number of training sets, batch rotation and mirroring the results obtained
in the second step are performed (Fig. 4-43). If there had been enough data in the original set from
the firm, this step would not have been necessary. First, one records all the operations, then runs
the batch, which is the same as step 1. The name of rotated and mirrored images will be 1_2 and
1_3.
Figure 4-47: Rotated and mirrored operations records by actions window
4.2.4 Results
Through the combination of manual photo and automated batch operations, two sets of training
sets are finally obtained. The first group contains Groups 1 & 2, and the second group contains
Groups 2 & 3. The file names and contents of the three groups of pictures correspond to each other
(Fig. 4-44).
84
Figure 4-48: Part of Groups 1,2, & 3 (upper) and zoomed in views of six sets (lower)
4.3 Future Work Application
When model 1 is obtained, model 1 can predict the category and location of furniture in a given
room shape. One future work application can be to reintroduce it back into the model based on the
generation (Fig. 4-45).
Figure. 4-49: Future work methodology diagram
This section will introduce the feasible framework for this future work. One starts with a model 1
export diagram as an example to introduce the specific logic in the Dynamo code to the application.
Then problems are summarized that cannot be solved at this stage.
85
This future work is the process of applying the machine learning models based on image
recognition and generation to the actual process. This step is the last piece of the puzzle is
integrating the entire process. The goal is to export the room shape map from the existing empty
Revit office room model, put it into the trained model, generate room furniture prediction map,
and finally based on the color and location information in the image, have Dynamo place 3d Revit
furniture into the empty room.
The main functions mentioned above are introduced previously in this chapter and in Chapter 5
Model Training. So this section will introduce in detail the Dynamo logic framework and node
that can theoretically realize this function. Suppose one creates a new empty room in Revit.
Through the steps that have been completed and previously described, the room shape map that
meets the requirements of the training set is exported. At this stage, it is necessary to use the new
Dynamo code to import the information of the map generated by the model back to Revit (Fig. 4-
46).
Figure. 4-50: Future work process
The main steps to accomplish this are the following (Fig. 4-47):
1. Read the picture and sample the pixels.
2. Filter colors and calculate the center point of each color block,
3. Select the center point of the green square and the placement coordinates of the door in the
Revit model as the reference point, and use the door as the reference object for the conversion
between the picture and the Revit model.
4. Calculate the vector size from the center of the green (door) to the center of the blue (table) on
the picture. The resulting vector is expanded according to the actual unit in the Revit project.
Displace the door coordinates in the Revit model to a new point through the obtained vector and
place the furniture corresponding to the color at this point.
86
Figure. 4-51: Future work Dynamo code
4.3.1 Read Picture and Pixel Sampling
For a single and simple example, Dynamo can achieve all the above functions.
First the code reads the picture and retrieves the picture from the local computer through the
combination of Node File Path, File From Path, and ReadFromFile. Displaying the picture in
Dynamo through node Watch Image (Fig. 4-48).
Figure 4-52: Dynamo: image Input
Then it reads the length and style of the picture through Node Image.Dimensions. The unit of the
length and width of the picture is pixels. The quality of the image generated by the model is
87
256*256, so the value obtained by width and height here is also 256. The purpose of obtaining the
length and width is to be able to edit the number of pixels sampled by Node Image.Pixels on the
original image. In Image.Pixels, xSample and ySample respectively mean how many points are
sampled at equal distances in the x-axis (width) and y-axis (height) directions of the image (Fig.
4-49). The smaller the number of inputs, the smaller the amount of calculation for the entire model,
and the faster it runs. But the relative disadvantage is that the position of the center point of the
corresponding color block may not be accurately obtained.
Figure 4-53: Dynamo: pixel sampling
The Image.Pixels output list shows the process of reading pixels. In this list, the second-level list
represents the number of rows (y), and the bottom-level list represents the number of columns (x).
This list is an axis (Fig. 4-50). This means that as long as one can filter out the required color and
the number of the list in which it is located, and the number in the list, the code can determine its
coordinate value.
Figure 4-54: Coordinated from the list
4.3.2 Filter colors and calculate the center point of each color block
This step mainly converts the information on the picture into the data in the Revit model (Fig. 4-
51).
88
Figure 4-55: Dynamo code for center coordinate of Blue
There are mainly two levels of work flow that need to be addressed:
1. Extract the information from the picture.
2. Convert the information on the picture to the Revit model.
The required data at these two levels must be related and correspond to each other. In the example,
the placement of the furniture in the Revit model is at the center of the facility. Therefore, the
center point of the color block is also selected at the image level. When the center point is obtained,
the vector AB of the furniture movement can be obtained (Fig. 4-52).
Figure 4-56: Vector of furniture movement
In order to obtain the center point of the color block, the minimum coordinate point and the
maximum coordinate point of the color block must be obtained. When these two points are
connected, the diagonal of the color block, the midpoint of the diagonal and the midpoint of the
color block can be obtained (Fig. 4-53).
89
Figure 4-57: Center of the diagonal
First, filtering out the colors. Because all colors are marked by the maximum value of RGB colors
at the same time, it is very difficult to filter with RGB colors. In the RGB color mode, although it
is possible to obtain blue by B = 255, But the RGB of white is (255, 255, 255). If this method were
applied, you will get white. In addition to the color mode of RGB, HSB mode can also be used to
filter colors. HSB represents hue, saturation, and brightness. In this mode, H ranges from 0 to 360.
Its advantage is that it can use a single value. cCan represent the hue of a color. In this image, the
value of the green pixel in the green box is 119-121, and the value of the blue is 239-241. So in
the dynmao, the RGB hue value is read through Color.Hue. Color.Hue read and filter by node: ==
(Fig. 4-54).
Figure 4-58: Color filter
By customizing node, it extracst the code of the list where the element that is true is located, and
the code in the list where it is located. The exported x and y values are placed under a list separately
(Fig. 4-55).
90
Figure 4-59: Customized Python code and its output list
Because in this coordinate axis, the values of x and y are positive, the point closest to the upper
left corner (minimum value) and the point closest to the lower right corner (maximum value) can
be obtained by filtering the sum of the two values. Then the midpoint between the two points is
immediately obtained through the calculation of the coordinates. Since it has been reduced by ten
times before, it is enlarged ten times here, which is the midpoint of the blue color block. The final
result obtained here is the two values of the midpoint coordinate, which is convenient for
calculation later, not the original coordinate point (Fig. 4-56).
Figure 4-60: Center by max and min points
The green and red color block selection steps are the same as this one. When done, the midpoints
of all the color patches were obtained.
4.3.3 Reference point for coordinate transformation
The position of the green color block (door) will not change during the Revit export process or the
machine learning model conversion process, so. On the premise that there is only one door in each
91
office room, the center point of the green color block (door) is the most suitable point for all
reference points. So the Dynamo code gets the coordinates of the door in Revit through
Element.GetLocation (Fig. 4-57).
Figure 4-61: Get Door Coordinate
However, the coordinate point of the door is not consistent with the center of the green box (Fig.
58). The actual coordinate of the door is located at the midpoint of the open wall. So before the
final calculation of the displacement vector, the obtained coordinates need to be translated upward.
Figure 4-62: Door’s Revit data location(upper) and door’s images location
4.3.4 Calculation and Put Furniture
Through conversion, the actual Revit model (mm) is nearly 40 times the coordinate of the picture
(pixel). As half of the width of the door is nearly 0.4 meters, so 400 is subtracted from the converted
result. The two values obtained are added to the coordinates of the door (Fig. 59). And a table is
put in this position. The chair also undergoes the same operation.
92
Figure 4-63: Calculation
Although the function is realized, many parameters cannot be adjusted, such as the orientation of
the furniture (Fig. 4-60).
Figure 4-60: 3D Revit office
4.3.5 Future Work Summary
For a single room model, this methdology is feasible and does not require long running time.
However there are concerns that this process does not have the potential for widespread application.
Dynamo based on this framework has the characteristics of low model accuracy and low versatility.
· Low accuracy
93
1. This ideal method based on Dynamo is based on pix2pixHD to completely generate a model
with clear boundaries. However the fact is that the model is somewhat fuzzy, and the colors are
not accurate. This places high demands on the screening process.
2. In the process of model and picture coordinate conversion, it is necessary to measure the ratio
and the width of the door. Minor errors in these numbers can cause the model to overlap with other
elements or fail to stick to the wall. The actual quality is not ideal. A better method needs to be
determined for setting the exact coordinate of the door.
· Low versatility
1. Although this simple example is successful, in the face of the actual training set, the table is not
just a quadrilateral. At the same time, there is usually not only one table in the office, and there is
not only one kind of table and one kind of chair. Three colors cannot replace all families in a room.
2. A complete model is not only an office, and an office does not know it has a door. This will
cause the door as a reference point to fail.
3. At this stage, three softwares have been managed in this step, from Dynamo to Photoshop to
pix2pixHD and back to Dynamo. If this will become a common tool, it is necessary to integrate
such a process and simplify it.
4. At this stage, this is only the initial stage of a project, it simply realizes color recognition,
position calculation, and furniture placement. If this tool is to be perfected, a renewed level of
image artificial intelligence technology is necessary, not just GAN. This would also probably
include the use of image recognition.
4.3 Summary
This chapter introduced how to complete the entire training sets with Photoshop when the Dynamo
theoretical method was unsuccessful. Please note that all the steps that were done in Photoshop
were intended to have been done in Dynamo.
The training set of machine learning has a considerable number of characteristics. If it cannot be
fully automated, the time consumed by any change is huge. The BIM software contains various
information. It can bear the advantages of large quantity and complex data. However, this has
caused unavoidable difficulty in processing and extracting large amounts of data-time cost.
Machine learning is a tool that can calculate probabilities and predict results based on existing
94
experience and information. Essentially it should be a time saver. But the process of making it
takes far more time, which may also be part of the obstacles to the promotion of modern artificial
intelligence.
Another problem is reflected in the actual process of deriving the training set in this study. The
office floor plans are only extracted from one or two projects, and the diversity of models is
shallow. This is reflected in the shape and orientation of the room. And because the number is not
enough to perform, the operations such as rotation and mirroring were taken. The results after
training can no longer take the room orientation as a factor for training result analysis. However,
when the training set can be derived faster and more automatically in future, the training set's
diversity will definitely increase, and the training results will be more analytically valuable.
Chapter 5 discusses the entire training process, including training, testing, and analyzing the test
results.
95
5. DETAILED TRAINING PROCESS
Chapter 4 introduced the process of obtaining the training set. The final training sets folder has
three groups. The first group is the shape of the treated room, doors and windows. The second
group has the addition of the furniture based on the first group. The third group is the actual
furniture floor plan. The order of the rooms in each group is the same, and the file names of the
same room in different groups are the same (Fig. 5-1)
.
Figure 5-64: 3 Groups images from Chapter 4 (top); larger views of two examples for the three
sets
Pix2pixHD can learn the relationship between two kinds of pictures, and the trained Pix2pixHD
can convert the input picture into another picture. So in the three groups of images, any choice of
two groups can be turned into a training set for pix2pixHD to learn.
In Chapter 5, a total of three training sets were made. They are Group 1 and 2, Group 2 and 3,
Group 1 and 3. These three training sets will be trained by pix2pixHD into three pix2pixHD models
that can perform different image conversion functions, respectively, called Model 1, Model 2, and
Model 3 (Fig. 5-2).
96
Figure 5-65: Training sets for 3 models
Model 1 and Model 2 will be nested together after training (the output of Model 1 will be used as
the input of Model 2). This combination of the two models is called a nested model. The results
generated by the nested model will be compared with Model 3. What they have in common is that
Group 1 style pictures are converted into Group 3 style office floor plans. The goal is to obtain a
model with the highest quality (Fig. 5-3). At the same time, Model 1 is also an important part of
future application work in Chapter 4.
Figure 5-66: Nested model compared to Model 3
5.1 Training
This section will introduce the training process of the three models. In order to facilitate the
understanding of the reference values, the pix2pixHD model will be briefly introduced here.
97
There are two main neural network structures in Pix2pixHD: generator (G) and discriminator (D).
They are trained at the same time; the function of G is to convert the input picture into the output
picture; D uses another picture in the training set to identify the output picture generated by G. In
summary, G is constantly generating fake pictures and keeps making this fake generated picture
close to the real picture (Fig. 4). D is constantly trained to discern the quality of pictures generated
by G. There is a constant confrontation between G and D.
Figure 5-67: Training process of pxi2pixHD
One can use training set 1 and model 1 as examples. Group 1 in the training set is called input.
Input is converted by G, and a new picture is generated. This picture is called the synthesized
image. Group 2 in the training set is called ground truth, which is used to train D. Ground truth
means that it is the standard answer in GAN. This standard answer continuously improves D’s
understanding of real pictures, and this process of improvement makes D stricter on the pictures
generated by G, forcing G to improve the quality of the pictures generated by G, and through this
cycle, the quality of synthesized images generated by G is constantly approaching the "standard
answer," which is the Group 2 picture, finally realizing the image conversion function.
Figure 5-68: Screen shot from the Linux terminal
The model will be trained on the rented training platform Paperspace by using Nvidia Quadro
P6000 (Fig. 5-6).
98
Figure 5-69: Graphic card check by !nvidia-smi
5.1.1 Training Process
Train.py was run through the shell command in Paperspace to start the training process. The
Python script is located in the in the pix2pixHD folder. The hyperparameter setting and the start
command are input at the same time. In the pix2pixHD file officially released by Nvidia, there is
a “option” folder, and the hyperparameters that can be adjusted for training and testing are in the
file in that folder (Fig. 5-7).
Figure 5-70: parts of hyperparameter in the folder
Run train.py through the python train.py command. Through the instruction --label_nc 0, let the
model directly read the value of pixel RGB. The instruction--no_instance can close the instance
map. --dataroot is used to read the location of the training set, and the --name command is used to
name the name of the saved training file. --save_epoch_freq_5 is used to adjust the frequency of
model saving. --loadSize 256 can scale the training images into 256*256 pixels (Fig. 8). When
using Group 1 as an example, the training time of the loadSize256 training set is 24 hours less than
that of 1024.
99
Figure 5-71: Training instructions
The model will give priority to the training set detection. When the training set does not match the
file name or the number does not match, the model will not start running (Fig 9).
Figure 5-72:Training set detection
5.1.2 Model 1
In practice, Model1 has been trained twice. The first time it was in the process of being trained, it
crashed. After 33 hours of training, the model did not achieve high quality and even failed to
generate clear box boundaries (Fig 5-10).
Figure 5-73: In every group:(from left to right) Input, Synthesized, Ground truth
The training set was adjusted for the second run. The difference from the original training process
was that the learning rate of the model was adjusted and the hyperparameter –loadSize 256. The
load size was set to the 256, the default setting was 1024, which means it will transfer the training
image to 1024*1024. Therefore, the training time of each epoch increased from an average of 580
seconds to 93 seconds.
For the second run, the quality of each epoch improved the boundaries of the color blocks are clear,
there are no fuzzy color blocks, and the position is reasonable. At the 100th time, the model has
100
been able to produce pictures with clear boundaries. One can analyze the location of the furniture
when the boundary is clear (Fig 5-11).
Figure 5-74: Result from Model 1 Ver 2 epoch 1,
101
In the previous introduction to the actual meaning of the pix2pixHD loss value, a chart was made
with the loss value and the training process (epoch), and the fluctuations in the chart proved that
the model was working properly.
In the actual training process, the model will output 3 values after the training is completed for
every 100 pictures, which are G_GAN, D_Real and D_Fake. G_GAN is the loss value of the
generator,; D_real is the loss value of the discriminator to the real picture, ; and D_fake is the loss
value of the G generated picture. In a non-GAN machine learning model, machines learn by means
of a loss function. It’s a method of evaluating how well specific algorithm models the given data.
If predictions deviates too much from actual results, the loss function would cough up awill be a
very large number. There is usually only one loss function, and the convergence of this function
can indicate the success of the model training. But in GAN, because the two models compete
against each other, when the loss value of G becomes lower, D_Real will become higher, and
D_fake will become lower. Therefore, the value recorded during training fluctuates all the time,
which means to GAN that the training is proceeding smoothly. And tThe larger the fluctuation
range, the higher the diversity of the training set (Zheng, 2017). But in the end, to judge whether
the model is successfully trained, it still has to be judged by the quality of synthesized images.
For example, in figure 5 (second and third lines), at epoch 27, the number of iterations is 90,
G_GAN equals 1.585, D_real is 0.231, and D_fake is 0.136. Later at iterations 390, G_GAN is
1.266, D_real is 0.455, and D_fake is 0.154 (Fig. 5-12). Relative to each value individual, the
lower each value is, the better for this individual, which means that it completes its work better.
But this is a confrontation process. If a function converges, it means that the one of models has
stopped the confrontation. For a GAN, it means the model collapses.
Figure. 5-75: Screen shot from Linux terminal
102
The loss value chart shows the trend of the loss value of G and D as the training epoch increases.
The X axis is the number of epochs, and the Y axis is the loss value of G or D. In my model, the
three loss values have been fluctuating, which proves that the GAN model is continuously training
and there is no crash. The fluctuation range of G is between 0.2 and 3.5, and the main fluctuation
range of D_real is between 0 and 0.2. The fluctuation range of D_fake is between 0 and 1.2. These
data cannot be compared with each other. It is necessary to have another set of pix2pixHD with a
different training set to judge the diversity of the training sets by comparing the range of the
fluctuation. Since the loss functions of the two models in GAN cannot converge, the quality of the
synthesized image has nothing to do with the loss function.
Figure 5-76: Loss Value for every 100 images: X axis: Loss value, Y axis: epoch
0
1
2
3
4
0 50 100 150 200
G_GAN
0
0.5
1
1.5
2
2.5
0 50 100 150 200
D_real
0
0.5
1
1.5
0 50 100 150 200
D_fake
103
The model randomly records the generation quality of a picture in each epoch during training.
When the model is set to 200 epoch, 200 synthesized pictures will be recorded. Ideally, the quality
of synthesized pictures should continue to improve (Fig. 5-13).
. . .
Figure 5-77: One random generated image from every epoch in model 1_2 (above); enlarged
view of training start and end showing success as the images have gotten less blurry (below)
104
5.1.3 Model 2
The results are clear. Due to the various types of furniture in the training center, the shape of the
chair (handle and backrest) is fuzzy, but it can be recognized (Fig. 5-14).
Figure 5-78: Result from Model 2 epoch 1, 50, 150, 200
105
Since the training set contains more colors (Group 2), complex lines (Group 3). Compared with
Model 1, the fluctuation range of the data of the three icons is larger. It also proves that the model
is training properly (Fig. 5-14).
Figure 5-79: Loss value of Model 2
0
0.5
1
1.5
2
2.5
3
3.5
0 20 40 60 80 100 120 140 160 180 200
G_GAN
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 20 40 60 80 100 120 140 160 180 200
D_real
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 20 40 60 80 100 120 140 160 180 200
D_fake
106
The image quality is also increasing with the epoch (Fig. 5-16).
Figure 5-80: One random
generated image from every epoch in model 2
5.1.4 Model 3
The purpose of Model 3 is to compare the differences between nested models and non-nested
models. Model 1 and Model 2 will be nested together after training .The output of Model 1 will be
used as the input of Model 2. The nested model realizes the conversion from Group 1 to Group 3
without showing the Group 2 generation When two models are trained to achieve the same function,
is there a difference in quality?
107
Figure 5-81: Model 3
The entire training process is very slow. After the model is trained for epoch 150, the quality of
the graph generation has just reached the discernible range.
Figure 5-82: Results of Model 3 from epoch 50,100,150,200
After 200 epochs of training are completed, the quality of the model is still unacceptable.
Compared with Model 2, the quality of Model 3 is lower. The solution may be solved by increasing
108
the training epoch, but this means that it will take more time. Therefore, Model 3 no longer
participates in the subsequent testing process. Later testing will focus on the nested models of
Model 1 and Model 2.
5.1.5 Summary
The training process is the most time-consuming process in this research. In the case of low
training set diversity, the model can reach a clear state at an earlier training stage. Low diversity
is a term used when the shapes and area are very similar. A clear state would be that the synthesized
image and ground truth are almost the same, and there are no blurry or mixed colors.
In addition to the quality and quantity of the training set, the training quality is affected. It also
includes the learning rate in the machine learning model. Learning rate is a general concept in
machine learning. In pix2pixHD, it represents the magnitude of each neural network neuron weight
update. If the learning rate is too large, the optimized parameters will fluctuate around the
minimum value (Fig. 5-19). When a large number of parameters continue to fail to reach the
optimal value, the quality of the model will never reach the ideal state (close to ground truth).
Figure 5-83: Learning rate (Jordan, 2018)
Therefore, adjusting the learning rate to a lower value is an effective and safe method. This is
especially in the model where the training process cannot be interrupted and continued. 33 hours
was the length of time for this run, but it could have been even longer.
The relationship between model quality and loss function is very unclear, and it can be said there
is no relationship. In the process of image model training, it should be more important to focus on
the output image quality than to make the loss function converge.
109
Model nesting is an effective method, although the possibility of Model 3 crashing during training
is not ruled out. However, it can be seen from Model 3 that pix2pixHD found it more difficult to
recognize the lines, because the line contains less information (the number of pixels is lower).
5.2 Test
Starting from 150 epoch for Model 1 and Model 2, the input the synthesized images selected at
random are very close to the ground truth. This shows that the training is almost complete starting
from epoch 150. This chapter will test the final results of the nested model. The process of testing
is to input the images in the test set into Model 1, and then input the synthesized images of Model
1 into Model 2 to obtain the final room plan predicted by Model 1 (Fig. 5-20). This section includes
the introduction of the test set, test operation, and test result analysis.
Figure 5-84 Nested Model by Model 1 and Model 2
5.2.1 Test Sets
The function of the test set is to test the model and to analyze the results of the model for practical
applications. So before training, the test set needs to be completed. The test set is usually extracted
from the data in the training set. They have the same format as the training set, but do not
participate in training.
Figure 5-85 Part of test set for Model 1
110
The synthesized pictures are compared and analyzed by comparing against the ground truth. The
analysis includes the comparison of the quality of the picture (color and degree of blur) and the
location of the furniture between synthesized and ground truth.
Model 2 is simple. the synthesized images are converted from the given furniture type and location
abstract drawing to the ordinary floor plan. It has no analytical significance for the location of
furniture.
5.2.2 Test process
The start-up steps of the test are similar to the training steps of the model, and the test start-up
process still requires hyperparameters (Fig. 5-23). In addition to inputting hyperparameters, it is
necessary to create a new trained folder in the pix2pixHD folder, starting by copying and pasting
the generator to be tested into this folder. The test process tests the generator, not the discriminator.
Figure 5-86: Test hyperparameter setting
5.2.3 Nested Model
In the first step, input images are entered into Model 1 to get the synthesized images of Model 1.
The clarity of the model is lower than that of ground truth, but the boundaries of most of the color
blocks are very clear (Fig. 5-24).
111
Figure 5-87: Input images and its synthesized images from test of Model 1
At the same time, the location of most of the pictures of the office is consistent with the ground
truth, and some of the tables have different shapes in the same location, resulting in irregular shapes.
Both inputs are obtained by rotating and mirroring the same image.
Figure 5-88: Input, Synthesized, Ground truth
112
One directly imports the synthesized images of Model 1 into Model 2 for floor plan generation.
Unfortunately, the results were not ideal. All the results show the situation where the boundary is
blurred and distorted.
Figure 5-89 Nested process test
Through observation, it is found that the periphery of the color block of the synthesized image of
model 1 is not perfectly clear (Fig. 5-26). Although these color differences are ignored by the
naked eye, when the actual color is read, the color around the color block is not black RGB (0, 0,
0). These pixels of the wrong color make Model 2 misunderstand.
There are two hypothetical solutions:
1. By increasing the number of picture pixels, the noise around the color blocks will not affect
the model too much. When the number of pixels increases, the noise boundary decreases
in the percentage of pixels occupied, thereby reducing the impression of blurring.
2. Directly use the synthesized images of Model 1 to become the training set of model. In this
way, the boundary of noise can be ignored through the process of machine learning.
Figure 5-90: The area around the color block is blurry in Model 1 synthesized images
Of the two methods, the second method is more time-saving. It takes 33 hours to train high-quality
Model 1, while training low-quality Model 1 only takes 6 hours.
113
5.3 Summary
Pix2pixHD training is a very time-consuming process. In the case of 1024 pixels by 1024 pixels
per picture, each model takes about 32 hours. The final quality of the trained model depended
mainly on the quality and quantity of the training set and the learning rate of the model. The
hyperparameters of the model cannot be changed during the training process (except for a
gradually decreasing learning rate). If the model training fails while underway, nothing has been
saved and everything must be restarted.
During model training, the output loss value has no reference value. Human eyes must manually
judge if the pix2pixHD model is collapsed. If the picture quality has not improved or the picture
is still unclear after 100 epoch, it should be stopped to reduce wasted time and restart with lower
load size or lower learning rate
Because the training set comes from two projects, the test set extracted from the original training
set has an ideal result, and the clarity is very high. But the input of the new shape, the internal
furniture are usually arranged unexpectedly. The model does not perform well in the face of unseen
shapes (Fig . 5-28).
Figure 5-91:New office room shape synthesized images has low image quality
For the nested model, the synthesized image of model 1 is not clear, which leads to the
unsatisfactory result of Model 2. In the case of sufficient time, a set of Model 1 of high-quality
training set can be trained by the method of training 1024*1024. High-quality pictures can reduce
the impact of blurred borders.
In order to improve the versatility of the model, the diversity of the training set should be increased.
Ideally, it should include offices of different areas, different orientations, and different shapes. In
114
addition, without rotating the mirror, the number needs to be more than it is now. The training set
is everything for this type of machine learning study.
There were a total of 243 pictures from the company, and data with no furniture, duplicates, unclear
images, and incomplete borders of the room were deleted. In the end, 175 pieces of original data
are left. Through Photoshop rotation and mirroring, a total of 525 training pictures were obtained,
more than 1000 original pictures. It would have been better without any rotation and mirroring,
but a larger training set was needed so these were used.
During the test, it was found that the model can recognize the size of the room and the position of
doors and windows to change the layout.
The nested model has a higher quality and a more reasonable layout than a floor plan generated
directly from the shape of the room. This process proves that marking images in a way that can be
read by the model can improve the quality of the production.
115
6. CONCLUSION,DISCUSSION AND Future Work
A method based on machine learning model was developed that can automatically generate the
office furniture layout images and Revit 3D model of an office when the shape of the office room
and the positions of the windows and doors are known. The original office drawings were provided
by a designer at an architecture office. The training set was obtained through the combination of
Dynamo and Photoshop. The whole process can be summarized as follows: obtain the training set,
train two pix2pixHD models, and create a Dynamo code that can read pictures and import them
into Revit based on the results generated by pix2pixHD. However, in the actual operation process,
many unforeseen problems appeared, which made the actual methodology and the theoretical
methodology have a great gapdiverge. This chapter will introduce describe the context of the work,
what was achieved, the limitations in the process, potential future work, and a provide a conclusion
to the research (Fig. 6-1)
Figure 6-92: One sample for whole process
6.1 Context
BIM and artificial intelligence can be used together. BIM is an information integration system in
the AEC industry, which enables users to track various information of buildings in different life
cycles. This large, trackable and processable information system provides opportunities and data
for machine learning to perform. In addition, BIM software such as Revit has a built-in
programming language, Dynamo, that can be used to enhance features and provide interoperability
capabilities between software programs. A machine learning model is a way to predict future data
by learning existing data and experience. It is a kind of artificial intelligence technology. Compared
with ordinary computer programming software, machine learning models have the ability to
116
improve themselves. The combination of BIM and machine learning makes artificial intelligence
full of potential in the field of construction.
The encounter between architecture and artificial intelligence did not appear suddenly. Architects
have attempted to standardize modular building design in the past century, and integrate computers
and architecture. Then to control parametersBeing able to allow for changes in design parameters ,
parameter design is realizedhelp lead to other forms of computational design. , Then then to
artificial intelligence, trying to find more correlations between parameters. This development
process is regular. The widespread use of BIM potentially provides better conditions for machine
learning data. With the help of BIM software, pix2pix training set export becomes possible.
Pix2pix is based on other researchers’ previous work in other disciplines. and examples in
architecture.
In recent years, the image processing function has exploded due to the emergence of the deep
learning model GAN. Therefore, this research intends to use the software in the BIM system to
export the training set in image format, and apply it to the machine learning model Pix2pixHD of
the image generation function to realize the function of automatic planning of office furniture in
offices.
Previous automated room apartment room layout has been accomplished by GANs (see chapter
2.3) (Fig. 6-2). The idea of transforming from room shape to floor plan was borrowed from Zheng
Hao's study of apartment floor plan generation.
Figure 6-2: Apartment floor plan generation by Pix2pix (Zheng, 2018)
117
The idea of nested model from ArchiGAN was borrowed. R. Whereas ArchiGAN focused on
entire floor plan layouts, it provided a good background to realize the design of the office layouts
(Fig. 6-3).
Figure 6-3: Nested model for apartment floor plan generation (Chaillou,2019)
6.2 Methodology
Through the building information modeling software Revit and its built-in Dynamo visual coding
program, one can extract the parameters one needs from the a 3D information model of the a
building. Dynamo can be used to extract training set images that can be learned by pix2pix. Pix2pix,
as a GAN-based image conversion machine learning model, uses internal generative networks and
discriminative networks to fight against each other to optimize output results. It uses a large
number of paired pictures as a training set to obtain the ability to convert two pictures, and it can
convert pictures outside the training set. In this case, by learning the pictures exported by Dynamo,
pix2pix can automatically generate the indoor layout by giving the specified room shape and the
position of the entrance and windows. Since the original BIM data set cannot be directly accessed
or operated on by the researcher, the methodology of the initial hypothesis is different from the
actual method used.
In the theoretical methodology, Dynamo can fully automatically extract the required training set.
Ideally, Dynamo can obtain a large number of floor plans and processed floor plans in a short time,
complete the training sets of the two models, and make a complete Dynamo-based Revit
conversion function from images to 3D models (see chapter 3.1).
But in the real situation, Dynamo not only needs to export a large amount of data, but also needs
to make changes on a large amount of data. The operation time was nearly 20 hours longer than
expected, and it was accompanied by problems such as unstable operation and low production
quality. Compared with the theoretical methodology, the actual methodology takes the actual
situation into consideration and re-changes the code for Dynamo to obtain the training set,
118
minimizes the running time of Dynamo, and finally obtains a normal office layout through the
combination of manual and batch operations in Photoshop (Fig. 6-4).
Figure 6-4: Actual Methodology
In the model training part, a total of three pix2pixHD were trained, and the three pix2pixHD
models used three different sets of training sets (Fig.6-5).
Figure 6-5: Training sets for Model 1, 2 & 3
The training set pictures are were divided into three groups, Group1 is was the room shape, color
block diagrams of doors and windows, Group2 adds contained the addition of color blocks of
furniture on the basis ofbased on Group1, and different colors represented a kind of furniture. The
third group is were an ordinary floor plan. In each group, a file name corresponds corresponded to
a room. The number of pictures in each group is the same and corresponds corresponded to each
other (Fig.6-6).
119
Figure 6-6: 3 Groups images from Chapter 4 (top); larger views of two examples for the three
sets
In the training process, since the loss function cannot directly determine the final quality of training,
it is necessary to observe the output of each epoch of the model. During Pix2pixHD training, an
input is was randomly selected for testing and recorded after each epoch is completed. In model 1
training, the output changes changed from fuzzy to clear. (Fig. 6-7).
120
. . .
Figure 6-7: One random generated image from every epoch in model 1_2 (above); enlarged view
of training start and end showing success as the images have gotten less blurry (below)
The purpose of training the three models is was for comparison. After model 1 and model 2 are
were nested, the function realized by the nested model will be the same as model 3. After
comparing the two models, the nested model has was of a higher quality, so the nested model is
was selected (Fig.6-8 ).
121
Figure 6-8: Nested model v.s Model 3
Display The final display of the actual test process of the nested model was successful (Fig.6-9).
Figure 6-9: Test process of nested model
At the same time, the results of model 1 can bewere read by Dynamo, and the information of the
color blocks in the picture is was directly translated into the family type and location of the
furniture in the Revit model. Dynamo can realized the automatic placement function of reading
pictures (Fig. 6-10). Only a test sample case of one floor plan was done to test the feasibility of
this step.
122
Figure 6-10: Synthesized image to Revit 3D model
6.3 Limitations
Every step in the process can be improved. This section will explain the limitations (in the training
set, Pix2PixHD, blurry images, and validation), and proposed improvement plans in the future
work.
6.3.1 Training Set
A large amount of training set data is usually a prerequisite for the machine learning model to exert
its advantages. This also means that the training set is everything about machine learning. But in
this study, the process of obtaining the training set was not smooth, and the results were not ideal.
In an ideal state, Dynamo code should be able to automatically export the ideal training set pictures
from the architecture firm’s structure database of building information models. However, the ideal
plan ignores the time cost and software stability required to process information on a model with
a huge amount of data. The massive amount of data contained in BIM software lays the foundation
for machine learning to enter the field of architecture. However, the large and complex data has
become the most fatal flaw for several reasons including that the data in the building database was
not structured completely consistently to extract the images needed, the amount of data took a long
time to be processed, and the amount of images needed necessitated using Photoshop to create
more floor plans using rotation and mirroring of the original data set. In order to complement the
amount of training sets, there was a lot of time manually processing to obtain a small number of
training sets and flips the model to the mirror. The number of models was increased, but the
diversity was notdid not.
At the same time, the complexity of the office space is low, and most of the space functions in the
training set are offices. Originally it was the intent to use classrooms as they had more diversity,
but not enough drawings were available to create a reasonable training set. ==pix2pixHD is fully
123
capable of dealing with more complex indoor space layouts with different spatial uses, such as
apartments, through nested models.
6.3.2 Pix2pixHD
The decline in the quality of the training set directly leads to loss quality of synthesized images,
which is mainly reflected in the versatility and accuracy of the model.
1. Good versatility means that the model can generate discernible results in the face of unseen
input in the training set.
2. Good accuracy means that the model can generate a distinguishable furniture layout.
A large number of training sets and multiple styles can fundamentally improve these two
performances. But on the contrary, the existing training set is not good in these two aspects.
6.3.3 Blurry Images
Based on the results generated by the machine learning model, this research achieves two functions:
creating nested machine learning models for generating a floor plan of the room and using Dynamo
code to read the results of machine learning and place furniture in the Revit office model. Both
steps require the machine learning model to be able to generate high-quality pictures. However, in
the absence of a high-quality training set, the versatility and accuracy of the machine learning
model have been greatly affected, which ultimately leads to the blurring of the boundary of the
floor plan generated by the nested model 2. The Dynamo code cannot accurately calculate the
position of the home when reading information in pixels (Fig. 6-11).
Figure 6-11: Model 1 blur synthesized image causes door was placed in the wall
6.3.4 Validation
Although all the data is recommended by the architecture firm, it has not been validatedevaluated,
which means that there is no way to know whether the room is a good layout. But most of the
rooms are single-use offices, so the layout in the original model may not have been carefully
124
designed. If there is an original layout designer, it will definitely be more helpful to the analysis
of the training results of the model.
6.4 Future Work
The first thing needs to be done as future work is fixing the limitations discussed in Section 6.2,
including poor quality of training set, low versatility of model, wrong location by blur synthesized,
and lack of validation.. Then, more work can be done . A few improvements would be to
automated the entire process more efficiently for fewer manual interventions, optimize the training
set, create the 3d Revit models, one-to many neural network, use a more diverse floor plan, and
even evaluate the floor plans.
6.4.1 Entire Process Automation
Since the functions implemented at this stage include Revit dynamoDynamo, Model 1 and Model
2 are then imported back to Revit. The continuity of the actual operation to achieve the entire
process is very poor, so it is very important to integrate the entire process and make it into a Revit
plug-in. Under ideal circumstances, exporting the data from the BIM database, checking and then
implementation for standardization of furniture and office names, creation of the view names at
the same scale, creation of the jpg images, (the completion of model training is very high),
exporting the G in the pix2pixHD model, and then integrate integrating the model export
(Dynamo), generation (Generator) and import (Dynamo) through codecould be done automatically.
6.4.2 Training Set Optimization
At this stage, the training set derivation process is the most important factor hindering the overall
research quality. So trying a variety of other methods to come out of the training set will be the
first step in future work. First The first choice can would be optimize the Dynamo code. These
methods of data set generation are more like data management and mining, requiring a lot of
knowledge and technology. At the same time, the training set export optimization can save time,
which means that more images can be exported, and the process will would be more stable. The
improvement of the quality of the training set makes the quality of the generated pictures better.
And assuming sufficient time, high-quality model image training can be carried out. (1024*1024
pixels).)
When the quality of the generated image becomes higher, the quality of the 3D Revit office layout
based on the pixel information of the generated image will also be improved.
125
6.4.3 Revit 3D Model Creation
Although the function of 3D 3d model generation has now completed the frameworkbeen complete
for a single case study, it cannot be widely used. Now thereThere are several are the following
problems:
1. It is not acceptable to have the same two colors or more than two color blocks in the room.
2. The ratio between the room and the picture cannot be calculated;, it can only be estimated.
3. For a 1024*1024 pixel synthesized image, it will take a long time to read.
4. The If the image of furniture in synthesized image is irregular, for example an L-shaped
workstation (Fig.9)(L-Shape), and, the algorithm to find the of center point cannot be widely
promotedis more complex (Fig.11). And the center point of irregular furniture cannot be
confirmed.This center point is necessary for several uses including having the image of the
furniture correspond to a 3d piece of furniture to place it correctly in the 3d model.
Figure 6-12: L-shape table in synthesized image
5. The direction of the furniture cannot be changed when placing the furniture.
If this research needs to be applied in practice, all the above problems need to be overcome.
6.4.4 One-to-many neural network
The shortcomings of GAN are very obvious. This kind of "one-to-one" neural network can only
give one kind of plan based on a kind of plane boundary, and architects can often design multiple
schemes for clients. Therefore, as a tool that will be useful in actual workplaces, it must meet
actual needs. Then One would need to replace GAN with the secondary development of neural
network and train it into a hybrid neural network. Let The the artificial intelligence model not only
understands what is looking at, for example an office plan. It needs to learn a house plan from the
126
surface image, but really learn the vector logic behind the layout plan, and generate a variety of
plan plans in a mode closer to the thinking of the architect.
6.4.5 Influence of other parametersFloor Plan Evaluation
Part of the reason to generate multiple designs is to discover “new,” “better” or “more appropriate”
choices. New might include layouts that the designer had not thought of previously. Better or
more appropriate might have the layouts be applied to some sort of evaluation system that narrows
the choices down to ones that have better lighting or less glare or a “more efficient” furniture
layout. As a future application for machine learning, this has a lot of potential.
This research came to automatically derive the room shape through doors, windows and room
shapes. In actual situations, there are many factors that affect the room layout, including functions
and sounds. If you can visualize the influencing factors, and put them into Group 1 pictures for
learning.
6.4 Conclusion
Regarding the rationality of the office layout, first of all,as the office itself is not complicated, and
secondly, the diversity of the training set is not high, so the final prediction model hasgenerated
no surprises designs. Most of the prediction results are consistent with the layout in the training
set.
The training set is critical to machine learning. During the entire process, because of the inadequate
understanding of the details of the process at the beginning, it was blindly set to use only one
method or software to generate the floor plan of the room, which led to the subsequent data set
export process, time-consuming, poor quality, and small quantity. Like dominoes, once the the
training set began to fall, there were problems from model training to the application of results.
However, in the end, it did get a more ideal result, because the training set office layout itself is
not complicated. A lower quality training set can also accomplish the purpose of training. The
nested model has a higher quality and a more reasonable layout than a floor plan generated directly
from the shape of the room. This process proves that marking images in a way that can be read by
the model can improve the quality of the production. Although the quality is only relatively
idealfair for the this specific application of machine learning to generate offices and there is still a
very long distance from to the an actual application for designers, it is possible to obtain reasonable
results for going from BIM to machine learning and back to BIM.
127
Regarding the rationality of the office layout, first of all, the office itself is not complicated, and
secondly, the diversity of the training set is not high, so the final prediction model has no surprises
design. Most of the prediction results are consistent with the layout in the training set.
As aThe technology of image recognition and generation, when the computing power of the
computer has not reached the next level is still in its infancy. The process of data processing and
extraction, or and the quality of generation that cannot be parameterized during training is still the
one obstacle. This high-cost, unstable and low-return method will certainly not be widely promoted.
When the computing power of the computer continues to increase, machine learning algorithms
will be more feasible for personal computers, or cloud computing might be the answer. With faster
computers, innovative ideas on how to apply machine learning to the building industry, and
educated users, it will be possible to solve certain types of problems with artificial intelligence in
the design and construction field. In the near future, it will surely become a powerful tool, just as
CAD first appeared in the field of architecture as a tool and then BIM.
128
References
James Manyika, Susan Lund, Michael Chui, Jacques Bughin, Jonathan Woetzel, Parul Batra, et al.(2017).
Jobs lost, jobs gained: What the future of work will mean for jobs, skills, and wages, Retrieved
From:https://www.mckinsey.com/featured-insights/future-of-work/jobs-lost-jobs-gained-what-the-future-
of-work-will-mean-for-jobs-skills-and-wages#
Stanislas C.(2019).AI & Architecture, Feb 24 Retrieved From: https://towardsdatascience.com/ ai-
architecture-f9d78c6958e0
Hao, Z. , & Huang, W. . (2018). Understanding and Visualizing Generative Adversarial Networks in
Architectural Drawings. Learning, Prototyping and Adapting, Short Paper Proceedings of the 23rd
International Conference on Computer-Aided Architectural Design Research in Asia (CAADRIA),(5),12-
57.
Haynes, & Barry, P. . (2007). An evaluation of office productivity measurement. Journal of Corporate Real
Estate, 9(3), 144-155.
Francis D, Denice J, Andrew L, Stephen W. (1998).New Environments for work. Publisher of Humanities.
Retrieved From: https://www.routledge.com/New-Environments-for-Working/Duffy-Jaunzens-Laing-
Willis/p/book/9780419209904
Lee, Chad.(2008).BIM changing the construction industry. Stanford: John Wiley & Sons, Inc.
Moreno, C. , Olbina, S. , & Issa, R. R. . (2019). Bim use by architecture, engineering, and construction (aec)
industry in educational facility projects. Advances in Civil Engineering, (2), 1-19.
Xu, H. , Feng, J. , Li, S. . (2014). Users-orientated evaluation of building information model in the chinese
construction industry. Automation in Construction, 39, 32-46.
129
Goedert, J. and Meadati, P. (2008) Integrating Construction Process Documentation into Building
Information Modeling. Journal of Construction Engineering and Management (ASCE), 134, 509-516.
Bouazza, T. , Udeaja, C. , & D Greenwood. (2015). Bouazza, T. , Udeaja, C. , & D Greenwood. (2015).
The Use of Building Information Modelling (BIM) in Managing Knowledge in Construction Project
Delivery: Conceptual Model. BIM, 12(11),15-19.
Lin, Y. C. , & Su, Y. C. . (2013). Developing mobile- and bim-based integrated visual facility maintenance
management system. The Scientific World Journal, 124-249.
Jian, Li, Lei, Hou, Xiangyu, & Wang, et al. (2014). A project-based quantification of bim
benefits. International Journal of Advanced Robotic Systems, 11(8).
Boeykens, S. . (2012). Bridging Building Information Modeling and Parametric Design. eWork and
eBusiness in Architecture, Engineering and Construction, 9th ECPPM Conference Proceedings.112-154.
Bringsjord, S. , Khemlani, S. , Arkoudas, K. , Mcevoy, C. , Destefano, M. , Daigle, M. . (2005). Advanced
synthetic characters, evil. International Conference on Game-on.11,114-136.
Stephanie Sammartino McPherson, (2018). Artificial Intelligence: Building Smarter Machines, 1st Edition.
New York: Twenty-First Century Books.
Christoph Molnar.(2020). Interpretable Machine Learning. Londun: Hands-On Machine Learning.
Tom M. Mitchell.(1997).Machine Learning. England:Springer.
Kotsiantis, S. B. . (2007). Supervised machine learning: a review of classification
techniques. Informatica, 31(3),19-45.
Li J, Wong Y k , Zhao Q, Mohan S. K.(2017).Unsupervised Learning of View-invariant Action
Representations. Cornell University.12(11):21-29.
130
Isola P , Zhu J Y , Zhou T ,Alexei A. E.(2017). Image-to-Image Translation with Conditional
Adversarial Nets. Cornell University.20(10):19-30.
Cohen, J. L. , CCBY. (2014). Le corbusier's modulor and the debate on proportion in france. Architectural
Histories, 2(1).
Schum Ac Her, P. . (2016). Advancing social functionality via agent ‐based parametric
semiology. Architectural Design, 86(2), 108-113.
Luigi M, Federico B, Marco M. (2002). Luigi Moretti: Works and Writings. Princeton Archit.Press.138-
138.
Manyika, J. , Chui, M. , Miremadi, M. , Bughin, J. , George, K. , & Willmott, P. , et al. (2017). A future
that works: automation, employment, and productivity. McKinsey Global Institute,11,15-58.
Carl, Anderson, See, all, articles, & Carlo, et al. (2018). Augmented space planning: using procedural
generation to automate desk layouts. International Journal of Architectural Computing, 16(2), 164-177.
131
APPENDICES
Appendix A: Pix2pixHD code analysis train.py
import time
import os
import numpy as np
import torch
from torch.autograd import Variable
from collections import OrderedDict
from subprocess import call
import fractions
from options.train_options import TrainOptions
from data.data_loader import CreateDataLoader
from models.models import create_model
import util.util as util
from util.visualizer import Visualizer
opt = TrainOptions().parse() # Import
training parameters
iter_path = os.path.join(opt.checkpoints_dir, opt.name, 'iter.txt') #
Settings: the checkpoint path and opt.name are set to label2city by default. The epoch and its
indexes are saved in iter.
if opt.continue_train: # The default
value is Ture. Continue training.
try:
132
start_epoch, epoch_iter = np.loadtxt(iter_path, delimiter=',', dtype=int) #
Load the current epoch and epoch_iter
except:
start_epoch, epoch_iter = 1, 0# if an exception occurs, set the two to 1, 0
print ('resume from epoch %d at iteration %d' % (start_epoch, epoch_iter)# print is recovered from
epoch and iter
else:
start_epoch, epoch_iter = 1, 0# If% = False, the training starts directly from 1, 0.
() calculate the maximum common divisors of a and B: the maximum common divisors of 100 and
3 are 1,100*3/1=300, so the printing frequency is 300
opt.print_freq = lcm(opt.print_freq, opt.batchSize)# printing frequency and batch
if opt.debug:# parameter debugging
opt.display_freq = 1# display frequency
opt.print_freq = 1# printing frequency
opt.niter = 1# initial learning rate of iter
opt. Learn = 0# The linear attenuation learning rate of iter is zero
opt.max_dataset_size = 10
### Load a dataset
data_loader = CreateDataLoader(opt)# load dataset
dataset = // ()# load data and call the load_data() function in data. Custom _//
dataset_size = len(data_loader)# dataset length
print('#training images = %d' % dataset_size)# print how many photos there are in the dataset. The
Street View here is 2975 photos.
model = create_model(opt)# create a model based on input parameters
133
visualizer = Visualizer(opt)# visualization operations
if opt.fp16:# operations related to accelerated computing
from apex import amp
model, [optimizer_G, optimizer_D] = amp.initialize(model, [model.optimizer_G,
model.optimizer_D], opt_level='O1')
model = torch.nn.DataParallel(model, device_ids=opt.gpu_ids)
else:
optimizer_G, optimizer_D = model.module.optimizer_G, model.module.optimizer_D
total_steps = (start_epoch-1) * dataset_size + epoch_iter# total number of steps to run
display_delta = total_steps% opt.display_freq# The remainder operation is used for if judgment
below.
print_delta = total_steps% opt.print_freq# as above
save_delta = total_steps % opt. Save_## as above
for epoch in range(start_epoch, opt.niter + opt.niter_decay + 1):
epoch_start_time = time.time()
if epoch! = start_epoch:# if the epoch is not equal to the read epoch, the calculation updates the
epoch_iter.
epoch_iter = epoch_iter% dataset_size# residual operation: the length of the current step% dataset
for I, data in enumerate(dataset, start = epoch_iter):# The epoch_iter here is the index value of the
input dataset list.
if total_steps % opt.print_freq = print_delta:# record the start time of a print batch
iter_start_time = time.time()
total_steps + = opt.batchSize# total number of steps recorded
epoch_iter + = opt.batchSize# record the current number of steps
# whether to collect output images
134
save_fake = total_steps % opt.display_freq = display_delta # bool: whether to save fake pictures
############## Forward Pass ######################
# Call the Pix2PixHDModel () function in class BaseModel (forward) and enter four types of data
sets (I only use label and imgs); Return loss and fake_img
losses, generated = model(Variable(data['label']), Variable(data['inst']), Variable(data['image']),
Variable(data['feat']), infer=save_fake)
# sum per device losses
### isinstance() function to determine whether an object is a known type, similar to type().
# isinstance() is different from type():
# type() does not consider a subclass as a parent class type, regardless of inheritance.
# isinstance() considers a subclass as a parent class type and considers inheritance.
# If you want to determine whether the two types are the same, isinstance() is recommended.
losses = [torch.mean(x) if not isinstance(x, int) else x for x in losses]# if x is not of int type, the
mean value is calculated. if it is int, x is returned directly.
loss_dict = dict(zip(model.module.loss_names, losses)# First use zip() to map elements one by
one, then use dict to create a dictionary
# calculate final loss scalar
loss_D = (loss_dict ['d_fake'] + loss_dict ['d_real']) * 0.5# first calculate the discriminator loss:
equal to the average value of the true and false discriminant loss
loss_G = loss_dict ['g_gan '] + loss_dict.get(' G_GAN_Feat ', 0) + loss_dict.get ('g_vgg', 0)#
calculate the loss value of the generator, because I do not consider feat and vgg, there is only
G_loss here.
############### Backward Pass ####################
# update generator weights
135
optimizer_G.zero_grad() # Clear the
generator Optimizer gradient to 0 if opt.fp16: #
fp16 and AMP are some content of mixed precision acceleration (Avex library provided by Nvidia)
with amp.scale_loss(loss_G, optimizer_G) as scaled_loss: scaled_loss.backward()
else:
loss_G.backward()
optimizer_G.step() # Optimizer for
Gradient Calculation
# update discriminator weights
optimizer_D.zero_grad() # Clear
gradient in discriminator Optimizer if opt.fp16:
with amp.scale_loss(loss_D, optimizer_D) as scaled_loss: scaled_loss.backward()
else:
loss_D.backward()
optimizer_D.step() # Step-by-step
derivation
############## Display results and errors ##########
### print out errors
if total_steps % opt.print_freq == print_delta:
errors = {k: v.data.item() if not isinstance(v, int) else v for k, v in loss_dict.items()}
# This is a shorthand method of dictionary + for loop traversal
t = (time.time() - iter_start_time) / opt.print_freq
visualizer.print_current_errors(epoch, epoch_iter, errors, t)
visualizer.plot_current_errors(errors, total_steps)
#call(["nvidia-smi", "--format=csv", "--query-gpu=memory.used,memory.free"])
136
### display output images
if save_fake:
### OrderedDict() is an ordered Dictionary:
https://www.cnblogs.com/gide/p/6370082.html
# Many people think that dictionaries in python are unordered because they are stored by hash,
# But python has a module collections (English, collection, collection), which comes with a
subclass
# OrderedDict to sort elements in dictionary objects.
visuals = OrderedDict([('input_label', util.tensor2label(data['label'][0], opt.label_nc))
('synthesized_image', util.tensor2im(generated.data[0]))
('real_image', util.tensor2im(data['image'][0]))])
visualizer. Display_current_results (visuals, epoch, total_steps)# save images and update websites
### save latest model
if total_steps % opt.save_latest_freq == save_delta:
print('saving the latest model (epoch %d, total_steps %d)' % (epoch, total_steps)# save the latest
model
### Save and load Pytorch models
# For example, we created a model: model = MyVggNet().
# If you use multi-GPU training, we need to use this line of code: model =
nn.DataParallel(model).cuda()
# After executing this code, the model is not our original model, but is equivalent to adding a shell
outside our original model that supports GPU operation,
# At this time, the real model object is: real_model = model.module,
model.module.save('latest')# Call the save() function in Pix2PixHDModel to save the model. Save
the latest network model:
np.savetxt(iter_path, (epoch, epoch_iter), delimiter=',', fmt='%d')
137
if epoch_iter >= dataset_size:
break
# end of epoch
iter_end_time = time.time()
print('End of epoch %d / %d \t Time Taken: %d sec' %
(epoch, opt.niter + opt.niter_decay, time.time() - epoch_start_time))
### save model for this epoch
if epoch % opt.save_epoch_freq == 0:
print('saving the model at the end of epoch %d, iters %d' % (epoch, total_steps))
model.module.save('latest')# save the latest network model: **;**
model.module.save(epoch)# save the network model, such as 10_net_G;10_net_D
np.savetxt(iter_path, (epoch+1, 0), delimiter=',', fmt='%d')
### instead of only training the local enhancer, train the entire network after certain iterations
if (opt.niter_fix_global != 0) and (epoch == opt.niter_fix_global):
model.module.update_fixed_params()
### linearly decay learning rate after certain iterations #
Linear decay learning rate after a specific iteration
if epoch > opt.niter:
model.module.update_learning_rate()
138
Appendix B : Pix2pixHD code analysis test.py
import os
from collections import OrderedDict
from torch.autograd import Variable
from options.test_options import TestOptions
from data.data_loader import CreateDataLoader
from models.models import create_model
import util.util as util
from util.visualizer import Visualizer
from util import html
import torch
opt = TestOptions().parse(save=False)
opt.nThreads = 1 # test code only supports nThreads = 1
opt.batchSize = 1 # test code only supports batchSize = 1
opt.serial_batches = True # no shuffle # Load
by batch without disturbing
opt.no_flip = True # no flip
data_loader = CreateDataLoader(opt)
dataset = data_loader.load_data()
visualizer = Visualizer(opt)
# create website
web_dir = os.path.join(opt.results_dir, opt.name, '%s_%s' % (opt.phase, opt.which_epoch))
# './results/'+'label2city'+'test_latest'
139
webpage = html.HTML(web_dir, 'Experiment = %s, Phase = %s, Epoch = %s' % (opt.name,
opt.phase, opt.which_epoch)) # Create an HTML file to view test results
# test
if not opt.engine and not opt.onnx:
model = create_model(opt)
if opt.data_type == 16:
model.half()
elif opt.data_type == 8:
model.type(torch.uint8)
if opt.verbose:
print(model) # Print model
else:
from run_engine import run_trt_engine, run_onnx
for i, data in enumerate(dataset):
if i >= opt.how_many:
break
if opt.data_type == 16:
data['label'] = data['label'].half()
data['inst'] = data['inst'].half()
elif opt.data_type == 8:
data['label'] = data['label'].uint8()
140
data['inst'] = data['inst'].uint8()
if opt.export_onnx:
print ("Exporting to ONNX: ", opt.export_onnx)
assert opt.export_onnx.endswith("onnx"), "Export model file should end with .onnx"
torch.onnx.export(model, [data['label'], data['inst']],
opt.export_onnx, verbose=True)
exit(0)
minibatch = 1
if opt.engine:
generated = run_trt_engine(opt.engine, minibatch, [data['label'], data['inst']])
elif opt.onnx:
generated = run_onnx(opt.onnx, opt.data_type, minibatch, [data['label'], data['inst']])
else:
generated = model.inference(data['label'], data['inst'], data['image'])
visuals = OrderedDict([('input_label', util.tensor2label(data['label'][0], opt.label_nc)),
('synthesized_image', util.tensor2im(generated.data[0]))])
img_path = data['path']
print('process image... %s' % img_path)
visualizer.save_images(webpage, visuals, img_path)
Abstract (if available)
Abstract
Driven by machine learning and faster computers, automation and artificial intelligence (AI) will profoundly impact many industries. From self-driving trucks, personal assistants can manage schedules and financial management applications that replace personal accounting. By systematizing repetitive tasks, automation can improve cost savings, reliability, and productivity. At the same time, it allows humans to focus on more high-value and complex tasks. Machine learning provides opportunities for automation in the building design process. A new type of machine learning model, Generative Adversarial Network (GAN), makes it possible to generate floor plans automatically. ❧ Based on the BIM data set of an architecture firm, Dynamo (a visual programming language in the Revit) was used to obtain a training set and a test set—two groups of the paired pictures. one type of the GAN called pix2pixHD were trained based on the training set, which is a process of learning the relationship between two paired image. A floor plan with all facilities locations would be predicted by given a certain shape of the room. Then the predicted result is reversed and input back to the rooms with facilities in Revit. ❧ Unforeseen problems led to workarounds in the process. The training set could not be fully automated and exported by Dynamo from the BIM database; it required manual operation with the assistance of Photoshop. The quality of the final training set was not as high as hoped for, but the model can still predict the layout of the office. The function of placing furniture in the Revit model was applied to a simple office room, but could not be applied to a more complex office. The theoretical process can be achieved, but with more work. The actual process used did provide a desired outcome using machine learning to create office plans.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Streamlining sustainable design in building information modeling: BIM-based PV design and analysis tools
PDF
BIM+AR in architecture: a building maintenance application for a smart phone
PDF
Building information modeling: guidelines for project execution plan (PxP) for India
PDF
Using building information modeling with augmented reality: visualizing and editing MEP systems with a mobile augmented reality application
PDF
MM Electrical Tool: a tool for generating electrical single line diagrams in BIM
PDF
A BIM-based visualization tool for facilities management: fault detection through integrating real-time sensor data into BIM
PDF
Quantify human experience: integrating virtual reality, biometric sensors, and machine learning
PDF
Streamlining precast back-frame design: automated design using Revit plugin
PDF
Visualizing thermal data in a building information model
PDF
Data visualization in VR/AR: static data analysis in buildings
PDF
Revit plugins for electrical engineering improvements in buildings: Lighting power density and electrical equipment placement
PDF
Automating fire code compliance using BIM and Revit plug-ins
PDF
Enhancing thermal comfort: data-driven approach to control air temperature based on facial skin temperature
PDF
Building bridges: filling gaps between BIM and other tools in mechanical design
PDF
Building information modeling based design review and facility management: Virtual reality workflows and augmented reality experiment for healthcare project
PDF
Human–building integration: machine learning–based and occupant eye pupil size–driven lighting control as an applicable visual comfort tool in the office environment
PDF
A BIM-based tool for accessible design standard plan check: an automated plan review plug-in for Revit focusing on accessible design
PDF
Development of data-driven user-centered building façade design guideline models: machine learning-based approaches to predict user preferences
PDF
An investigation on using BIM for sustainability analysis using the LEED rating system
PDF
Pre-cast concrete envelopes in hot-humid climates: examining envelopes to reduce cooling load and electrical consumption
Asset Metadata
Creator
Zhang, Zhequan
(author)
Core Title
Office floor plans generation based on Generative Adversarial Network
School
School of Architecture
Degree
Master of Building Science
Degree Program
Building Science
Degree Conferral Date
2021-08
Publication Date
05/11/2021
Defense Date
05/10/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
BIM,Dynamo,GANS,machine learning,OAI-PMH Harvest,pix2pixHD,Revit
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kensek, Karen (
committee chair
), Carney, William (
committee member
), Choi, Joon-ho (
committee member
)
Creator Email
zhequanz@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112720153
Unique identifier
UC112720153
Identifier
etd-ZhangZhequ-9627.pdf (filename)
Legacy Identifier
etd-ZhangZhequ-9627
Document Type
Thesis
Format
theses (aat)
Rights
Zhang, Zhequan
Internet Media Type
application/pdf
Type
texts
Source
20210512-wayne-usctheses-batch-837-shoaf
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
BIM
Dynamo
GANS
machine learning
pix2pixHD
Revit