Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Detection, localization, and repair of internationalization presentation failures in web applications
(USC Thesis Other)
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Detection, Localization, and Repair of Internationalization Presentation Failures in Web Applications by Abdulmajeed Alameer A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2019 Copyright 2019 Abdulmajeed Alameer Dedication To my mother, Norah, for her endless love and support. ii Acknowledgements My journey as a Ph.D. student at USC has been one of the most enlightening experiences in my life. By the end of this journey, I would like to acknowledge a number of people who supported me in reaching this point of my Ph.D. journy. First and foremost, I would like to thank my advisor, Professor William G. J. Halfond, for his endless encouragement and support throughout all stages of my Ph.D. journey. His continuous feedback for my ideas, implementation, paper writing, and presentations shaped my research skills and helped me nish this dissertation. I have been accompanied by his kindness and positive attitude since my rst days as a Ph.D. student. I will always be thankful to him for all the time and feedback he provided me. Besides my advisor, I also would like to thank the rest of my dissertation committee: Prof. Nenad Medvidovic, Prof. Sandeep Gupta, Prof. Chao Wang, and Prof. Jyotirmoy Deshmukh, for their valuable and constructive feedback. I also would like to thank Prof. Phil McMinn from the University of Sheeld for being a great collaborator, and for all the feedback and knowledge he shared with me and my other co-authors. I also would like to thank my labmates, Ding Li, Sonal Mahajan, Jiaping Gui, Mian Wan, Yingjun Lyu, Negarsadat Abolhassani, Ali Alotaibi, and Paul Chiou for all the fun moments and happy memories we had at the lab. I am so grateful for having them as labmates. They have been always supportive and encouraging. I would like to thank my family and my friends for their continuous encouragement and support since the beginning of my journey as a Ph.D. student. Last but not least, I would like to thank the sta of Joe and the Juice on Melrose Avenue for the enormous supply of caeine they provided to me, which kept me up and focused to write this dissertation. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vi List of Figures vii Abstract viii Chapter 1: Introduction 1 1.1 Major Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Insights and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Insight 1: Visual relationships between baseline web page elements can be used to detect Internationalization Presentation Failures (IPFs) . . . . . . . 4 1.2.2 Insight 2: Correct layout of a web page can be dened as a system of constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Overview of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 2: Overview of Dissertation 10 Chapter 3: Detecting and Localizing Internationalization Presentation Failures 13 3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Layout Graph Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Building the Layout Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 Layout Graph Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.4 Ranking the Likely Faulty Elements . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Subject Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4: Repairing Internationalization Presentation Failures 41 4.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1.1 Step 1: Extracting visual relationships between elements . . . . . . . . . . . 44 4.1.2 Step 2: Converting visual relationships to constraints . . . . . . . . . . . . 46 iv 4.1.3 Step 3: Solving constraints and producing a repair . . . . . . . . . . . . . . 49 4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.2 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.3 Experiment One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.4 Experiment Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Chapter 5: Related Work 62 5.1 Detection Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Repair Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3 Techniques for Supporting Internationalization and Web Design . . . . . . . . . . . 66 Chapter 6: Conclusion 68 6.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 References 71 v List of Tables 3.1 Dierent mechanisms of hiding HTML elements in a web page . . . . . . . . . . . 18 3.2 Layout Graph visual relationships along with their computation for an edge (v,w) 21 3.3 Results of IPF detection when removing the heuristics used by GWALI . . . . . . 32 3.4 Execution time of GWALI (in seconds) compared to other approaches . . . . . . . 38 4.1 Constraints that need to be enforced between the variables of two elements e 1 and e 2 in the Page Under Test (PUT) based on the visual relationships between these elements in the baseline web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Subjects used to evaluateCBRepair . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Results forCBRepair eectiveness and eciency . . . . . . . . . . . . . . . . . . . 54 vi List of Figures 2.1 Dissertation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 Part of Hotwire web page and an internationalized version having IPFs . . . . . . 14 3.2 Web page snippets of an original and two internationalized versions . . . . . . . . . 16 3.3 The Minimum Bounding Rectangles (MBRs) of the snippets shown in Figure 3.2 . 17 3.4 The Layout Graphs (LGs) of the MBRs shown in Figure 3.3 . . . . . . . . . . . . . 18 3.5 Example of non-failure text changing position after translation . . . . . . . . . . . 19 3.6 Example from a page not fully mirrored in its right-to-left version . . . . . . . . . 26 3.7 Example of an IPF in the header menu of Twitter's help center . . . . . . . . . . . 27 3.8 Detection accuracy of the three dierent strategies to handle Right-to-Left (RTL) language pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.9 Detection accuracy of GWALI compared to other approaches . . . . . . . . . . . . 33 3.10 Example of an IPF where text became overlapping with a button after translation 34 3.11 Histogram of the ranks reported by GWALI . . . . . . . . . . . . . . . . . . . . . 37 3.12 Break down of the running time of GWALI . . . . . . . . . . . . . . . . . . . . . . 38 4.1 Example of an IPF in a web page translated from English to Italian and illustration of how repairing it can introduce other IPFs . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Overview of my repair approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 CSS Box Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.4 Example of two solutions that satisfy the constraints and repair the IPF in the PUT. 49 4.5 Example of dierent repair versions of User Interface (UI) Snippets . . . . . . . . . 57 4.6 Ratings given by user study participants to assess repair quality . . . . . . . . . . 59 vii Abstract Web applications can be easily made available to an international audience by leveraging frame- works and tools for automatic translation and localization. However, these automated changes can introduce Internationalization Presentation Failures (IPFs) | an undesired distortion of the web page's intended appearance that occurs as HTML elements expand, contract, or move in order to handle the translated text. It is challenging for developers to design websites that can inherently adapt to the expansion and contraction of text after it is translated to dierent lan- guages. Existing web testing techniques do not support developers in debugging these types of problems and manually testing every page in every language can be a labor intensive and error prone task. In my dissertation work, I designed and evaluated two techniques to help developers in debug- ging web pages that have been distorted due to internationalization eorts. In the rst part of my dissertation, I designed an automated approach for detecting IPFs and identifying the HTML elements responsible for the observed problem. In evaluation, my approach was able to detect IPFs in a set of 70 web applications with high precision and recall and was able to accurately identify the underlying elements in the web pages that led to the observed IPFs. In the second part of my dissertation, I designed an approach that can automatically repair web pages that have been distorted due to internationalization eorts. My approach models the correct layout of a web page as a system of constraints. The solution to the system represents the new and correct layout of the web page that resolves its IPFs. The evaluation of this approach showed that it could more quickly produce repaired web pages that were rated as more attractive and more readable than those produced by a prior state-of-the-art technique. Overall, these results are positive and indicate that both my detection and repair techniques can assist developers in debugging IPFs in web applications with high eectiveness and eciency. viii Chapter 1 Introduction Web applications enable companies to easily oer their services and products on a worldwide basis. Many companies aim to increase their consumer base by oering their web applications in multiple languages. Although this capability provides companies with many benets, it also introduces the challenges of internationalization. To make the services provided by a web application accessible to international users, developers must make their websites gracefully handle dierent languages. Translated versions of text can have dierent lengths and heights, depending on the character set of the target language. The expansion and contraction of text after translation can lead to an Internationalization Presentation Failure (IPF), which is an undesired distortion of the page's intended appearance that occurs as HTML elements expand, contract, or move in order to handle the translated text. These failures are very common in translated web pages, a recent study showed that they occurred in 77% of translated web pages [19]. An internationalized web application can be built using many approaches. However, two gen- eral techniques have become widespread and popular. The rst approach is to isolate the \need-to- translate" text and images into separate language-specic resource les. These language-specic resource les are typically created by third party professional translators. When the page is requested, the user's web browser provides the user's preferred languages and a server side frame- work loads the correct language specic resource les and inserts them into placeholders in the web page. This mechanism enables developers to support localization for any one of the languages specied in the ISO 639-3 standard, which currently has more than 7,700 entries [12]. The second approach is to use online automated translation services (e.g., Google Website Translator [10]). With this approach, the developers install a plugin in their website. The visitors to the site can then select their desired language from a drop-down box and the plugin will scan the web page 1 to nd all of the textual content. Then, the plugin will send that text to the online automated translation service, which will reply with the translation of the text. The plugin then replaces the original text with the translated text. This approach allows the developers to easily make their web applications available in a large number of languages without the need to manually extract and translate the content. Although useful for building internationalized websites, neither of these approaches address the problem of maintaining the appearance of the translated version of the page, which can get distorted due to internationalization eorts. Maintaining the attractiveness of a web page is important. Previous studies have shown that users' impression of a website can be formed within fty milliseconds [34] of viewing a page. In fact, several studies have shown that users often base their impressions of trustworthiness and quality, and ultimately, decisions to purchase, on the design and visual appearance of a web page [27, 28]. A large scale study with more than 2,500 participants [30] showed that nearly half of the participants evaluated the credibility of a website based in part on the aesthetics of its visual design, including the website layout, typography, and font size. Another study [57] has shown that nearly 94% of negative website feedback was related to visual design issues rather than the actual content of a website. Software engineering researchers have proposed many techniques to aid developers in debug- ging appearance related issues in web applications. Multiple techniques target the detection and repair of dierent types of presentation issues, including Cross-Browser Testing (XBT) tech- niques (e.g., [26, 25, 56, 38]) , Mobile Friendly Problem (MFP) techniques (e.g., [60, 36]), and general presentation failures detection (e.g., [42, 59]). When it comes to internationalization re- lated issues, automated support in the research and practitioner community is limited. Existing techniques [23, 22, 52] targeting detection of internationalization problems can only perform a pre-dened set of checks, such as corrupted text or missing translations, and cannot repair IPFs. Apple's pseudo-localization testing [2] attempts to help developers identify IPFs, but requires them to check each page of an app manually. IPF repairing technique, IFix [40], needs as input a list of faulty element in the web page, and generating such a list manually is a labor intensive and error-prone activity. In addition, IFix requires extensive time to generate a repair for a single web page, and a large number of its generated repairs have a negative impact on the readability and the attractiveness of the web page. 2 1.1 Major Challenges Debugging IPFs is a labor intensive and error prone process. The process of debugging IPFs involves three steps; detecting, localizing, and repairing the IPFs. Each one of these three steps poses several challenges for testers and developers. First, detection of IPFs is a dicult task. Testers typically need to manually look at the User Interfaces (UIs) of the translated web pages and ensure that they conform to their intended appearance (i.e., they match the appearance of the non translated web pages). This is specially challenging for big web applications that contain a large number of web pages and support multiple languages. Testers need to manually inspect each element of each web page in every language the web application supports. Such a process can be time consuming and error prone; it is easy for the testers to miss an IPF that is exhibited in a web page. Second, localizing the elements that are causing IPFs in the web page is challenging. In modern web applications, web pages typically consist of hundreds or thousands of HTML elements. The visual properties and the rendering of these elements are determined by the browser based on a combination of HTML, CSS, and JavaScript code. Once an IPF has been detected, the underlying root cause of the IPF can be dicult to determine since the appearance of modern web pages is controlled by a complex rendering interaction of HTML, CSS, and JavaScript. This means that the connection between an observed failure and the underlying fault is often not straightforward. These complications make localization of IPFs a time consuming and error-prone process. Third, repairing IPFs that are detected in a web page also poses several challenges. The rst challenge is the size of the solution space. A single web page could contain thousands of HTML elements, each element containing several CSS properties that range over a large set of possible values to control the appearance of that element. Repairing an IPF involves nding correct values for the CSS properties of the right set of HTML elements that make the translated web page have an appearance that replicates the baseline web page and adapts to the internationalized text. The second challenge is that a repair must be carefully crafted by adjusting HTML elements in the web page without introducing new IPFs. Introducing additional failures, when attempting to repair an IPF, can easily occur due to the complex interactions between the elements in a web page. In other words, in order to repair one IPF, developers need to consider not only the elements that are involved in the IPF but also other elements that need to be adjusted to make the whole web page IPF-free. The third challenge is that the repair needs to be constructed so it does not compromise 3 the attractiveness and the readability of the web page. For example, a possible repair to an IPF could simply involve reducing the font size of the translated text. However, such a repair could lead to decreased readability of the text in the repaired web page. 1.2 Insights and Hypothesis The challenges software developers face in debugging IPFs motivate the goal of my dissertation, which is automating the process of debugging IPFs. In this section, I present the key insights that led to the research and the hypothesis that this dissertation tests in order to realize the goal. 1.2.1 Insight 1: Visual relationships between baseline web page elements can be used to detect IPFs To address the detection challenges discussed above, a naive approach could involve pixel to pixel image comparison between the translated web page and the baseline web page. However, such approach could not be eective for two reasons. First, the text in the translated web page has a dierent alphabet and a pixel to pixel comparison will ag these changes in the alphabet as IPFs; resulting in many false positives. Second, the translated web pages inherently have many small dierences in the sizes of the elements from the baseline web pages that are caused by the minor changes in the text size after translation. Image comparison would result in falsely reporting these minor dierences in the elements' sizes as IPFs. To address this problem, my key insight is that we can create models that abstract away changes in the alphabet of the text and minor changes in the elements' sizes. Such models need to capture only the impactful changes that distort the layout of the web page leading to an IPF. We can model the correct appearance of a web page by identifying the visual relationships between baseline web page elements. For example, elements in a web page can have visual relationships between them such as \east of", \intersects", \aligns with", and \contains". By building visual relationship models for both the baseline and the translated web page in this way, we can compare these models and determine if there is a discrepancy between the appearance of the baseline and the translated page, indicating a potential IPF. Building and comparing these models can be fully automated, which would make the detection of IPFs scalable for large web applications that support multiple languages. 4 1.2.2 Insight 2: Correct layout of a web page can be dened as a system of constraints An existing approach that repairs IPFs,IFix, uses search-based techniques to nd repairs. The problem with this approach is that it requires a long time to explore the space of possible repairs. The reason for this is thatIFix tries a large number of dierent values for the CSS properties of the faulty HTML elements. Then it evaluates the correctness of each possible set of CSS values by rendering a modied version of the web page and testing it. This process needs to be repeated hundreds or thousands of times until a repair is found, which results in an extensive time to nd the repair. For example, in some casesIFix needs 19 minutes to repair a single web page. To address this problem, my key insight is that we can analytically nd the correct values that lead to a repair. The correct layout of a web page can be dened as a system of linear constraints that are inferred from the visual relationships between elements in the baseline web page. Dening the correct layout of the web page as a set of constraints allows us to leverage the ability of state-of-the-art constraint solvers to quickly solve these constraints and identify solutions that could be used as web page repairs. Constraint-based approaches are ideal for this problem because they allows us to nd repairs without exhaustively exploring the large space of possible solutions and rendering the web page for each possible solution. 1.2.3 Hypothesis Based on these insights, the hypothesis of my dissertation is: Internationalization Presentation Failures can be detected and repaired with high eectiveness and eciency using approaches based on modeling the visual relationships among elements in the web page. To evaluate this hypothesis, I designed and implemented two techniques that are based on modeling the visual relationships in the baseline web page. The rst technique is for detecting and localizing IPFs in a web page. The technique uses a model of the visual relationships among elements to determine if the translated web page exhibits an IPF. It also uses the same model to localize the set of elements that are causing the IPF. I evaluated the eectiveness of this technique by computing its detection and localization accuracy and comparing these metrics against the results of existing techniques. I also evaluated my technique's eciency by computing its running time and comparing it against the running time required by the existing techniques. 5 The empirical evaluation of this technique demonstrated that it is highly eective in detecting IPFs in a web page, and it can perform the detection within a few seconds compared to the several minutes that are required by other techniques. The second technique I designed and implemented is for repairing web pages that exhibit IPFs. This technique uses a model of the visual relationships among elements in the baseline to determine the correct layout of the faulty translated web page. Then it uses that model to x the web page and make it match the baseline. I evaluated the eectiveness of this technique by computing the percentage of reduction in IPFs and comparing it against an existing state-of-the- art IPF repair technique. I also conducted a user study to evaluate the repair eectiveness from a human perspective to quantify the impact of the generated repairs on the attractiveness and the readability of the web pages. To evaluate the eciency of this technique, I computed its running time and compared it with the running time required by the existing IPF repair technique. The evaluation of this technique showed that it could more quickly produce repairs that were rated as more attractive and more readable than those produced by the prior state-of-the-art technique. The results of the two technqiues conrmed the hypothesis of my dissertation and indicate that my research is useful in supporting developers to automatically detect, localize, and repair IPFs in web applications. 1.3 Contributions The contributions of my dissertation include the design, development, and evaluation of two approaches that aid software engineers in detecting, localizing, and repairing IPFs. 1. Detection and Localization: I designed and evaluated an approach that is based on modeling the layout of a web page using visual relationships between elements. To the best of my knowledge, my approach was the rst to automate detecting and localizing IPFs. As part of this contribution, I also conducted an extensive evaluation to demonstrate the eectiveness and the eciency of my approach on a set of real-world web applications. I discuss this contribution in Chapter 3. 2. Repair: I designed and evaluated a repair approach that is based on modeling the layout of a web page using layout constraints. To the best of my knowledge, my approach was the rst to automate repairing IPFs by solving layout constrains. As part of this contribution, I also evaluated this approach to demonstrate its eectiveness and eciency in repairing 6 IPFs that were detected in a set of real-world web applications. I discuss this contribution in Chapter 4. 1.4 Overview of Publications In this section, I provide an overview of the publications that I published during the course of this dissertation. My dissertation work mainly comprises two bodies of work that correspond to the two techniques I developed to detect and repair IPFs. Each of the chapters is based on one or more papers, which have been published or are under submission. The eight papers corresponding to the two bodies of work are listed below. For each of the papers, I was the primary author (or one of the primary authors), with contributions including idea, design, implementation, and evaluation of the work. All of the papers were co-authored with my Ph.D. advisor, Prof. William G. J. Halfond. Chapter 3: Detection of Internationalization Presentation Failures In this chapter, I discuss the detection technique, GWALI, that I designed for detecting and localizing IPFs in web pages. This work was originally published in the research track of the IEEE International Conference on Software Testing, Verication and Validation (ICST) in 2016, and was a recipient of the IEEE Best Paper Award [21]. An extended version of this work is currently in preparation to be submitted at a software engineering journal [20]. The two papers describing the work and its extension were co-authored with Sonal Mahajan, a fellow Ph.D. student at USC. Related to this work, I also conducted an empirical study to analyze the frequency, severity, and the types of IPFs in web applications. This study was published at the IEEE International Conference on Software Maintenance and Evolution (ICSME) [19]. 1. [21] Abdulmajeed Alameer, Sonal Mahajan, and William G.J. Halfond. Detecting and localizing internationalization presentation failures in web applications. In Proceedings of the 9th IEEE International Conference on Software Testing, Verication, and Validation (ICST), April 2016 2. [20] Abdulmajeed Alameer, Sonal Mahajan, and William G. J. Halfond. Detecting and Localizing Internationalization Layout Failures in Web Applications. In submission 7 3. [19] Abdulmajeed Alameer and William G.J. Halfond. An empirical study of internation- alization failures in the web. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), October 2016 Chapter 4: Repair of Internationalization Presentation Failures In this chapter, I discuss the repair technique,CBRepair, which I designed for repairing IPFs that are detected in a web page. This work was originally published at the ICST conference in 2019 [18]. The paper describing this work was co-authored with Paul Chiou, a fellow Ph.D. student at USC. Related to this work, I also participated in the design, implementation, and evaluation of another technique,IFix [40, 37], that targets repairing IPFs. UnlikeCBRepair,IFix uses search-based approaches to repair IPFs that are detected in a web page. In Chapter 4, I discuss the limitations ofIFix and compare it againstCBRepair. Related to repairing presentation issues in web pages, I participated in the evaluation of another technique, XFix [38, 39], which targets repairing presentation failures that arise from viewing web pages in dierent browsers. Such presentation failures have similar nature to IPFs as both of them require modifying CSS properties in the web page to repair it. The papers describing bothIFix andXFix techniques were co-authored with Sonal Mahajan and Prof. Phil McMinn, a collaborator from the University of Sheeld, UK. 4. [18] Abdulmajeed Alameer, Paul Chiou, and William G.J. Halfond. Eciently repairing internationalization presentation failures by solving layout constraints. In Proceedings of the IEEE International Conference on Software Testing, Verication, and Validation (ICST), April 2019 5. [40] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Auto- mated repair of internationalization failures using style similarity clustering and search-based techniques. In Proceedings of the International Conference on Software Testing, Validation and Verication (ICST), April 2018 6. [37] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Ef- fective repair of internationalization presentation failures in web applications using style similarity clustering and search-based techniques. In submission 7. [38] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Au- tomated repair of layout cross browser issues using search-based techniques. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), July 2017 8 8. [39] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Xx: An automated tool for repair of layout cross browser issues. In Proceedings of the Interna- tional Symposium on Software Testing and Analysis (ISSTA) - Tool Demo, July 2017 9 Chapter 2 Overview of Dissertation In this chapter, I provide an overview of my dissertation. The goal of my dissertation is to develop techniques that can automatically detect, localize, and repair IPFs in web applications. In order to achieve this goal, my dissertation is divided into two thrusts. The rst thrust is detection and localization of IPFs. The second thrust is repairing IPFs that appear in a web page. Figure 4.2 shows an overview of the process of my dissertation. The inputs to the rst thrust are two web pages: the rst is the Page Under Test (PUT) and the second is a baseline version of the page that shows the correct layout. Typically, the baseline would be the original version of the page, which is already known to be correct and will be translated to another language, as represented in the PUT. The output is a set of potentially faulty elements that are represented by an ordered set of XPaths. For the second thrust, there are three inputs: the rst input is a PUT containing one or more IPFs. The second input is a list of faulty elements that are reported by the rst thrust. The third input is a the Layout Graph of the baseline web page, which denes the correct layout of the web page. The output of the second thrust is a repaired version of the PUT. For both thrusts, the input web pages are provided by a Uniform Resource Locator (URL) that points to them, and they can be stored in a local machine or over the network. The rst thrust of my dissertation is the detection and localization of IPFs in web applications. The goal of this thrust is to automatically detect if an internationalized web page contains an IPF, and if so, report a list of potentially faulty HTML elements in the web page. To achieve this goal, I designed and evaluated GWALI (Chapter 3). In order to detect and localize IPFs, my approach has three main steps. First, for the two inputs (the baseline and the PUT), it generates layout graphs, which are models that abstract the layout of web pages. Second, it compares the two layout graphs and identies dierences in the layouts of the two web pages that are caused by 10 Build Layout Graphs Edge 1 Edge 2 Edge 3 1- XpathC 2- XpathA 3- XpathB … Compare Layout Graphs Rank & Report Faulty Elements English русский Baseline PUT Create Constraints Based on Faulty Elements Solve Constraints and Generate New PUT русский Repaired PUT Detection and Localization Repair Figure 2.1: Dissertation overview 11 text expansion and contraction after translation of the PUT. These dierences are represented by a set of edges in the PUT layout graph that are dierent from the baseline layout graph. This set of edges represents potential IPFs in the PUT. Third, using the identied edges that represent dierences between the two layout graphs, my approach ranks and reports a set of potentially faulty elements. This set of elements is reported as an ordered set of XPaths. This localization of faulty elements guides the second thrust of my dissertation into repairing IPFs eciently. The second thrust of my dissertation is the repair of IPFs. The goal of this thrust is to repair IPFs, if they occur, in the internationalized web page. To achieve this goal, I designed and evaluatedCBRepair (Chapter 4). My repair approach,CBRepair, uses a list of IPFs that is detected in the web page by the rst thrust as an input. Having such a list reduces the number of constraints my repair approach needs to model, which makes repairing IPFs more ecient. Modeling constraints for the whole layout of the web page in order to repair the IPFs is also possible, but this will result in an extremely larger number of constraints, which makes the process of solving them and applying the repair to the web page time consuming. To repair a web page, my approach identies the correct visual relationships between the faulty elements using the Layout Graph of the baseline web page that is generated in the rst thrust, then my repair approach models these relationships as layout constraints. The repair approach then solves these constraints and generates a layout for the PUT that matches the baseline and also accommodates the translated text. Finally, the generated layout is used to produce the repaired PUT. 12 Chapter 3 Detecting and Localizing Internationalization Presentation Failures The goal of the rst part of my dissertation is to automatically detect and localize International- ization Presentation Failures (IPFs) that are exhibited in a translated web page. In this chapter, I give a detailed description of my detection and localization approach. I also provide an em- pirical evaluation of the approach to demonstrate its eectiveness and eciency in detecting and localizing IPFs. Various types of internationalization failures can arise in web applications, but not all of them are under the control of developers. These potential failures include: missing or incorrect translation of the text in the webpage, not using the proper local conventions, such as measurement systems, and using culturally inappropriate images or colors. Although impactful, the solutions to these specic types of failures are typically under the control of web designers and text translators, so I do not focus on them in my dissertation. In this chapter, I focus on detecting and localizing IPFs, which are distortions of a webpage's appearance after its translation to a dierent language. Figure 3.1 shows a real world example of an IPF taken from the Hotwire website. In the Mexican version of the page, the price text over ows its container and the description text is overlapping with the image . In general, the reason this type of problem occurs is because the size of the text varies signicantly based on its language [32]. This change in the size of the text is mainly aected by three factors: the number of characters in the translated text, the language's characters' width, and the language's characters' height. Some of these changes can be rather dramatic. For example, in English the word \please" translates to \Se il vous plait" in French, which is almost three times longer. More generally, IBM internationalization guidelines state that an English text that is shorter than ten 13 characters could increase in size from 100% to 200% when translated to another language [13]. The resulting change in text size causes the text to over ow its intended containing element or, if the containing element grows with the text, the container growth can cause other elements to move or overlap changing the layout of the page. (a) British version (b) Mexican version Figure 3.1: Part of Hotwire web page and an internationalized version having IPFs This chapter is organized as follows. In Section 3.1, I describe my detection and localization approach in detail. Then in Section 3.2, I give a detailed evaluation of the approach. Finally, I give a summary in Section 3.3. 3.1 Approach The goal of my approach is to automatically detect IPFs and identify the translated text that is responsible for the failure. My key insight is that IPFs are caused by changes in the size of translated text. Therefore, the approach denes and builds a model, called the Layout Graph (LG), that captures the visual relationships and relative positioning of HTML tags and text elements in a web page (Section 3.1.1). To use my approach, a tester provides two web pages as input: the rst is the PUT and the second is a baseline version of the page that shows the correct layout. Typically, the baseline would be the original version of the page, which is already known to be correct and will be translated to another language, as represented in the PUT. The approach rst builds a LG for each of these pages (Section 3.1.2). Then it compares these two LGs 14 and identies dierences between them that represent potentially faulty elements (Section 3.1.3). Finally, the approach analyzes and lters these elements to produce a ranked list of elements for the developer (Section 3.1.4). 3.1.1 Layout Graph Denition The LG is a model of the visual relationships of the elements of a web page. As compared to models used in related work, such as the alignment graph [25] and R-tree [42], the LG focuses on capturing the relationships of not only the HTML tags, but also the text contained within the tags. The reason for this is that the primary change to a web page after internationalization is that the text contained within the HTML tags has been translated to another language. The translated text may expand or shrink, which can cause an IPF. Therefore, the LG includes the text elements so that these changes can be more accurately modeled and compared. The LG is a complete graph dened by the tuplehV;Fi, where V is the set of nodes in the graph andF is a functionF :VV !P(R) that maps each edge to a set of visual relationships dened by R. Each node in V represents an element that has a visual impact on the page. A node is represented as a tupleht;c 1 ;c 2 ;xi, where t is the node type and is either \Element" (i.e., an HTML tag) or \Text" (i.e., text inside of an HTML tag), c 1 is the coordinate (x 1 ;y 1 ) representing the upper left corner of the node's position on the page, c 2 is the coordinate (x 2 ;y 2 ) representing the lower right corner of the node, and x is the XPath representing the node. The two coordinates represent the Minimum Bounding Rectangle (MBR) that encloses the element or text. The set R of possible visual relationships can be broken into three categories, direction (i.e., North, South, East, West), alignment (i.e., top, bottom, left, right), and containment (i.e., contains and intersects). 3.1.2 Building the Layout Graph In the rst phase, the approach analyzes the PUT and baseline page to build an LG of each. The approach rst analyzes the Document Object Model (DOM) of each page to dene the LG's nodes (i.e., V ) and then identies the visual relationship between the nodes (i.e., F ). 15 $45 4 stars Hotel (a) American version (original) MXN 750 Hotel&de&4&estrellas& (b) Mexican version (internationalized) قﺪﻨﻓ ٤ مﻮﺠﻧ ٧٠٠ ر . س (c) Arabic version (internationalized) Figure 3.2: Web page snippets of an original and two internationalized versions 3.1.2.1 Dening the Layout Graph Nodes The rst step of building the LG is to analyze the baseline page and PUT and compute the nodes in the LG. For each of these pages, this process proceeds as follows. The page is rendered in a browser, whose viewport size has been set to a predened value. This chosen viewport size has to be the same for both pages for a given comparison, but can be varied to detect IPFs at dierent resolutions. After the page is rendered in a browser, the approach uses the browser's API to traverse the page's DOM. For each HTML tag h in the DOM, the approach collects h's XPath ID (i.e., x), nds h's MBR based on the browser's rendering of h (i.e., c 1 and c 2 ), and assigns the type \Element" to the tag. If the node contains text (e.g., text between <p> tags or as the default value of an <input> text box) then the approach also creates a node for the text itself. 16 ../div/text() ../div ../div/div ../div/div/text() ../img (a) American version ../div/text() ../div/div ../div/div/text() ../div ../img (b) Mexican version ../div/text() ../div/div ../div/div/text() ../div ../img (c) Arabic version Figure 3.3: The MBRs of the snippets shown in Figure 3.2 For this type of node, the XPath is the XPath of the containing node plus the sux \/text()", the MBR is based on the size and shape of the text within the enclosing element, and the type is denoted as \Text." This process is repeated for all HTML tags found in the page's DOM with three exceptions, which I describe below. The rst exception is HTML tags that are not visible in the page. These tags do not aect the layout of the page and therefore do not have a visual relationship with any other tag. Ocially, there are specic HTML and CSS properties, such as visibility:hidden and display:none, that can be used to cause a tag to not display. Unocially, there are a myriad of ways that a developer can use to hide an element. These include setting the height or width CSS properties to zero; using the clip CSS property to cut an element to a zero pixel rectangle; and placing the 17 ../div ../div/text() Contains ../img North,RAlign,LAlign ../div/div Contains,RAlign ../div/div/text() Contains North West West South,RAlign,LAlign South South,RAlign South RAlign East North,RAlign Contains East North (a) American version (LG) ../div ../div/text() Contains ../img North,RAlign,LAlign ../div/div Contains,RAlign ../div/div/text() Intersect North West West South,RAlign,LAlign South South,RAlign South RAlign East North,RAlign Intersect Intersect East North Intersect (b) Mexican version (LG 0 ) ������ ������������� �������� ������ ������������������� ���������� ��������������� ����������������� ��������� ����� ���� ���� ������������������� ����� ������������ ����� ������ ���� ������������ ��������� ��������� ���� ����� ��������� (c) Arabic version (LG 0 ) Figure 3.4: The LGs of the MBRs shown in Figure 3.3 Table 3.1: Dierent mechanisms of hiding HTML elements in a web page # Mechanism 1 display = none 2 visibility = hidden 3 clip = rect(0px, 0px, 0px, 0px)_ clip = rect(1px, 1px, 1px, 1px) 4 width = 0px_ width = 1px 5 height = 0px_ height = 1px 6 ancestor's over ow = hidden^ element is not inside ancestor 7 ancestor is hidden using any mechanism listed in this table element outside the boundary of its container while also setting the over ow property to hidden. My approach does not create a node in the LG for these hidden HTML tag. This detection is done by querying the browser's API after the page is rendered, getting the applicable HTML and CSS properties for each element, and then determining if the element falls into one of the identied categories. Table 3.1 shows a complete list of the dierent mechanisms of hiding elements in a web page that my approach detects. The second exception is for HTML tags that do not aect the layout of the page. The tags are not explicitly hidden, as described above, but are nonetheless not visible in the page's rendering. These types of tags may be used to provide logical structure to the page. For example, 18 a <div> may be used as a container to group other nodes. As with hidden tags, there are many ways to dene these tags. Some of the heuristics I employ for this identication process are: (1) container elements that do not have a border and whose background color is similar to its parent's background color; (2) tags that have a very small dimension; (3) tags only used for text styling, such as <font>, <strong>, and <b>; and (4) tags representing an unselected option in a select menu. The third and nal exception is for HTML tags embedded in the text of another tag. An example of this is shown in Figure 3.5 where the link labeled \options" has moved to the second line after the translation and therefore would have a dierent visual relationship with the link labeled \free trials." Intuitively, we know that such changes are inevitable due to natural contraction and expansion of the translated text and should not be considered as IPFs. Therefore, the approach groups such tags together and creates one node in the LG for them with an MBR that surrounds all of the grouped elements and assigns to that node the type \Text." These three exceptions and the heuristics employed to identify them improve the detection accuracy of my approach. The completeness of these exceptions is not crucial for my approach correctness because they are only used to remove elements that are not observable from the analysis of the LG. Adding these elements to the LG will make my approach analyze them in further steps and potentially report them as having IPFs even though they are not observable, which decreases the accuracy of my approach. Figure 3.5: Example of non-failure text changing position after translation 19 3.1.2.2 Annotating Layout Graph Edges After computing the nodes of the graph, the second step of the approach is to dene theF function, which annotates each edge in the graph with a set of visual relationships. Recall that an LG is a complete graph, so this step is computing the visual relationship between each pair of nodes on each edge. To compute the visual relationship between two nodes on an edge, the approach compares the coordinates of each node's MBR. For example, for an edge (v,w), if v:y 2 w:y 1 then the relationship set would include North. Similarly, ifv:y 2 =w:y 2 then the set would include Bottom-Aligned and if (v:x 1 w:x 1 )^ (v:y 1 w:y 1 )^ (v:x 2 w:x2)^ (v:y 2 w:y 2 ) then it would include the Contains relationship. The other relationships are computed in an analogous manner. Table 3.2 shows a complete list of the relationships my approach considers along with their computation. The set of visual relationships my approach captures is adapted from the Android Relative- Layout API [1]. Android RelativeLayout allows developers to design the layout of a Graphical User Interface (GUI) by specifying the visual relationships between its components. In my ap- proach, I selected the subset of these visual relationships that can be directly aected because of the expansion and contraction of text after translation. This subset captures changes that have a direct impact on the aesthetics of the layout of a web page, and at the same time ignores minor changes that are hard to preserve after the expansion and contraction of text. Developers who use my approach can easily extend this set and apply a more strict or more relaxed set of visual relationships that need to be enforced in the translated web pages. For example, a more relaxed set of visual relationships can ignore the relationships that are related to the side alignment and only enforce directional and containment visual relationships. On the other hand, developers can enforce more strict relationships such as preserving the centering alignment between elements and maintaining the size proportions between elements. To illustrate the graph building process, consider the three web pages shown in Figure 3.2. The MBRs identied for these three web pages are shown in Figure 3.3. Next, Figure 3.4 shows the LGs produced for each version in the example. My approach produces two LGs at a time to be compared. I refer to the baseline (i.e., the American version) as LG, and the PUT (i.e., the Mexican version or the Arabic version) as LG 0 . As can be seen, the graph is a complete graph with edges labeled with the spatial relationships between the nodes they represent. For example the edge (../img, ../div) is labeled with the relationships \South, RAlign, LAlign," which means 20 Table 3.2: Layout Graph visual relationships along with their computation for an edge (v,w) Visual Relationship Computation using node's MBR coordinates West v:x 2 w:x 1 North v:y 2 w:y 1 East v:x 1 w:x 2 South v:y 1 w:y 2 Left-Aligned v:x 1 =w:x 1 Top-Aligned v:y 1 =w:y 1 Right-Aligned v:x 2 =w:x 2 Bottom-Aligned v:y 2 =w:y 2 Contains (v:x 1 w:x 1 )^ (v:y 1 w:y 1 ) ^(v:x 2 w:x2)^ (v:y 2 w:y 2 ) Intersects :(West(v;w)_ North(v;w)_ East(v;w) _ South(v;w)_ Contains(v;w) _ Contains(w;v)) that the element ../img is to the South of the element ../div and is also Right Aligned and Left Aligned with it. 3.1.3 Layout Graph Comparison In the second phase, the approach compares the two LGs produced by the rst phase in order to identify dierences between them. The dierences that result from the comparison represent potentially faulty tags or text that will be ltered and ranked in the third phase. A naive approach to this comparison would be to pair-wise compare the visual relationships annotating all edges in LG and LG 0 . In experiments, I found that the drawback of this approach was that dierences were detected for tags that were far away from each other on the page and whose relative change in position was not an IPF. Instead, my approach compares subgraphs of nodes and edges that are spatially close to a given noden in the LG. My insight, conrmed by the experiments reported in Section 3.2, is that comparing these more limited subgraphs ofLG andLG 0 , which I refer to as neighborhoods, is sucient to accurately detect IPFs and the responsible faulty elements. Before any comparison can take place, the approach must detect the language of the baseline and the PUT to determine if they have the same script direction. If the languages uses dierent script direction (i.e., one is Right-to-Left (RTL) and the other is Left-to-Right (LTR)) the approach needs to consider the mirroring in the LG that is caused by the dierence in the rendering direction of the page. In the rest of this section, I explain the details of this comparison, which requires rst determining the nodes to be compared (Section 3.1.3.1), then identifying the neighborhood for 21 each node (Section 3.1.3.2). After that, determining the language of the pages (Section 3.1.3.3), then comparing the two LGs using one of two mechanisms: (Section 3.1.3.4) if the direction of the baseline and the PUT scripts is the same, or (Section 3.1.3.5) if the directions are dierent. 3.1.3.1 Matching to be Compared Nodes Before any comparison can take place, the approach must identify nodes in LG and LG 0 that represent the same HTML element. Although each node contains an XPath, as I described in Section 3.1.1, certain translation frameworks, such as the Google Translate API, may introduce additional tags. This means that the XPaths will not be an exact match. To address this prob- lem, I adapted a matching approach dened by the WebDi XBT technique [26]. This approach matches elements probabilistically using the nodes' attributes, tag names, and the Levenshtein dis- tance between XPath IDs. I adapted this approach to account for a common variation introduced by the Google Translate framework, which embeds two nested<font> tags for any translated text in the web page. The matching technique dened in WebDi uses string Levenshtein distance between the XPath IDs (i.e., number of characters need to be added, removed, or replaced to make one XPath matches the other). My approach changes this metric by making it compute the tag Levenshtein distance instead (i.e., number of HTML tags need to be added, removed, or replaced to make one XPath matches the other). This change made the matching algorithm more accurate. The output of the matching approach is a mapM that matches each HTML tag or text in the baseline page with a corresponding tag or text in the PUT. In my experiments, I found that this matching was close to perfect for my subjects because the translation API introduced regularized changes for all translated elements. 3.1.3.2 Identifying the Neighborhoods After computingM, the approach then identies the neighborhood for each node n2 LG. To do this, the approach rst computes the coordinates of the four corners and center of n's MBR. Then, for each of these ve points, the approach identies the k-Nearest Neighbors (k-NN) nodes in the LG. The neighborhood is dened as the union of the ve points' k-NNs. The closeness function in the k-NN algorithm is computed based on the spatial distance from the point to any area occupied by another node's MBR. The calculation for this is based on the classic k-NN 22 algorithm [53]. In my experiments, I found that the approach works best when the value of k is set proportionally to the number of nodes in the LG. 3.1.3.3 Detecting Page Language After identifying the neighborhood for each noden2LG, my approach automatically detects the language of baseline and PUT web pages to determine the comparison mechanism to be used. Pages written in a RTL script need special handling because the browser renders them from right to left, which makes their layouts mirrored. In the example, directly comparing the Arabic version with the English baseline will lead my approach to falsely report the mirrored elements as IPFs. This is because any two elements in the page that have a \West" relationship, are expected to have an \East" relationship after the mirroring, and directly comparing the edges will result in agging the mirrored elements as potentially faulty elements, where in fact they are not. A naive way to detect the language of the page is to look at the lang and dir attributes in the <html> tag of the web page. However, a previous study shows that many websites do not use such attributes [19]. To automate the language detection process, my approach uses the popular language detection library, Compact Language Detector 2 (CLD2) [5]. The input to CLD2 is the URL of the web page and the contents of the webpage. The library then uses a Naive Bayesian classier to detect the language of the text in the page. If the webpage has multiple languages, CLD2 returns the top three languages used in the webpage. If more than one language was identied, my approach selects the top language (i.e., the most frequently used language in the page). Once the language of the page is identied, my approach determines the direction of the script for that language by comparing it against a master list of languages that are known to have a RTL script. Once the script direction of the pages representing LG and LG 0 are identied, my approach can compare them. If the baseline and the PUT are written in languages with the same script direction (e.g., American and Mexican versions), my approach will proceed to comparing the LGs as described in Section 3.1.3.4. If the baseline and the PUT are written in languages with dierent script directions (e.g., American and Arabic version), the two LGs will be compared using the mechanism described in Section 3.1.3.5. 23 3.1.3.4 Comparing Layout Graphs with Same Script Direction If LG and LG 0 have same script direction, my approach compares them by determining if the relationships assigned to edges in a neighborhood have changed. To do this, the approach iterates over each edgee that is part of the neighborhood of anyn inLG and nds the corresponding edge e 0 in LG 0 , using the previously generatedM function. Note that the corresponding edge always exists since both LGs are complete graphs. Then the approach computes the symmetric dierence between F (e) and F (e 0 ), which identies the visual relationships assigned to one edge but not the other. If the dierence is non-empty, then the approach classies the edge as a potential issue. The output of this step is I, a set of tuples of the formhe;e 0 ;i To illustrate the graph comparison step, consider the LGs of the American and Mexican versions shown in Figure 3.4. Consider the node labeled \../div/div/text()." The bold edges connected to this node represent its neighborhood that will be compared against its counterpart in the other LG. In the gure, I have underlined and highlighted in red the labels on the edges for which there is a dierence in the relationships. The edges with underlined red labels will be reported as potential issues and analyzed further in the third and nal phase of my approach (Section 3.1.4). 3.1.3.5 Comparing Layout Graphs with Dierent Script Direction If LG and LG 0 have dierent script directions, then my approach takes into account the layout changes that are caused by the mirroring in the RTL version. Mirroring of the page layout changes two types of relationships between the pages' elements, Direction and Alignment relationships. By direction I mean that elements positioned to the left side of a web page will be positioned to the right side in the mirrored version and vice versa. By alignment I mean that elements that are right aligned in a web page will become left aligned in the mirrored version and vice versa. Three dierent strategies can be employed to compare the LGs of pages having dierent script directions. I implemented each one of these strategies and evaluated their eectiveness in Section 3.2. The rst strategy that can be employed is comparing the LGs without any mirroring. In other words, use the same comparison technique that is described in Section 3.1.3.4. However, such a strategy might not work for pages having dierent script directions because the mirroring of the elements in the page could be reported as IPFs; resulting in many false positives. 24 The second strategy is comparing the LGs and fully mirroring the edges in the LG of the PUT. In other words, use the same comparison technique that is described in Section 3.1.3.4, but replace any label \West" in the PUT's LG with the label \East", and vice versa. Also, replace any label \Right Aligned" with \Left Aligned" and vice versa. This strategy assumes full mirroring for the RTL pages, and might not work for RTL pages that are not fully mirrored. An example of such a page is shown in Figure 3.6. Although the baseline and the PUT have dierent script direction, not all the elements in the PUT are mirrored. The two buttons and the two text boxes are not mirrored in the RTL Arabic version. Therefore, the \East" and \West" relationships will not change between the two buttons and two text boxes. However, the two lines of text on the top are mirrored as they are left aligned in the American version, but they are right aligned in the Arabic RTL version. The third strategy is comparing the LGs with mirroring the edges in the LG of the PUT, and excluding the edges that are not mirrored in the page from the mirroring process (partial- mirroring). In other words, to compare LG and LG 0 , the third strategy handles the partial mirroring by performing the normal comparison as in Section 3.1.3.4 with the following exceptions when comparing F (e) and F (e 0 ): (1) If F (e) contains \East" and F (e 0 ) contains \West", then do not include them in the symmetric dierence . (2) Similarly, if F (e) contains \West" F (e 0 ) contains \East", then do not include them in the symmetric dierence. (3) If F (e) contains alignment on one side (i.e., \Right Aligned" or \Left Aligned", but not both) and F (e 0 ) contains alignment on the opposite side, then do not include them in the symmetric dierence. To illustrate these exceptions in the comparison step, consider the LGs of the American and Arabic versions shown in Figure 3.4. Consider the node labeled \../div/div/text()." in the gure, the changes in the neighborhood edges from \East" to \West" and from \West" to \East" are not highlighted in red to be reported as potential issues. This is due to the rst and second exceptions mentioned above. Also, for other nodes, the changes in the Right and Left Alignment are not highlighted to be reported due to the third exception. The only highlighted edges are the ones with \Intersect" relationship, which will be reported as potential issues and analyzed further in the third and nal phase of my approach (Section 3.1.4). 3.1.4 Ranking the Likely Faulty Elements In the third and nal phase, the approach analyzes the set of tuples, I, identied in the second phase and generates a ranked list of HTML elements and text that may be responsible for the 25 (a) English version (b) Arabic version Figure 3.6: Example from a page not fully mirrored in its right-to-left version observed IPFs. To identify the most likely faulty elements, the approach applies three heuristics to the tuples in I and then computes a \suspiciousness" score that it uses to rank, from most suspicious to least suspicious, the nodes associated with the edges in I. The rst heuristic serves to remove edges fromI that were agged as a result of to-be-expected expansion and contraction of text. An example of this is shown in Figure 3.5. Here a <div> container element surrounds the icon, title, and text block. The comparison of the two LGs would detect that the bottom alignment of the two text blocks has changed and the comparison would report a potential issue. However, in this case this should not be counted as an IPF because the container element has enough space to allow for the text to expand without disrupting anything outside of the containing element. To identify this situation, the approach identies all edges where the type of the two constituent nodes is either Text/Element or Text/Text. If the of any of these edges contains alignment related relationships, then these relationships are removed from. If is now empty, then the tuple is removed from I. This heuristic only allows alignment issues to be taken into account if they aect the visual relationship between nodes that represent HTML elements. 26 (a) English version (b) Russian version Figure 3.7: Example of an IPF in the header menu of Twitter's help center The second heuristic establishes a method for ruling out low impact changes in the relative location of two elements. An example of such a change is shown in Figure 3.7. In the English version, the edge between the header element \Discover" and the search box is labeled West. After the translation to Russian, the element is shifted, and the relationship is changed, so the label becomes South. Although this is technically a distortion, it may or may not rise to the level of being considered an IPF. To calibrate for such situations, my approach allows testers to provide an threshold that denotes the degree of allowed change. For each pair of nodes in an edge inI, if the of that edge contains direction related relationships, then the approach uses the coordinates of the MBRs to calculate the change (in degrees) of the angle between the two nodes forming the edge. If the change is smaller than , then these direction relationships are removed from . If is now empty, then the tuple is removed from I. When computing the angles, my approach also considers cases where one of the baseline or the PUTs is written in RTL script and has a mirrored layout. In such cases, the angle is mirrored (angle = 180 angle) to account for the layout mirroring. In my experiments I found that = 60 provided a reasonable balance in terms of agging changes that would be characterized as disruptive and reducing false positives. The third and nal heuristic expands the set of edges in I to include suspicious ancestor elements of nodes whose relative positions have changed. An example of such a change is also shown by Figure 3.7. The elements in the header are li elements. After the translation of the page, the element li[6], which represents \Troubleshooting" has been pushed down to a new row in the Russian version of the page. Here, we cannot consider the faulty element to be only li[6]. In fact, the expansion of the text in li[1] through li[5] caused li[6] to be pushed down, so I report 27 an XPath selector that represents all the text children of the parent ul element. To handle this situation, when an edge in I is found that has a directional visual relationship that has changed, the approach traverses the DOM of the page to nd the Lowest Common Ancestor (LCA) of both nodes and adds an XPath selector that represents all of its text children to the list of nodes that will be ranked. After the three heuristics have been applied to I, the approach generates a ranked list of the likely faulty nodes. To do this, the approach rst creates a new set I 0 that contains tuples of the formhn;si, where n is any node present in an edge in I or identied by the third heuristic and s is a suspiciousness score, initialized to 0 for all nodes. The approach then increments the suspiciousness scores as follows: (1) every time a node n appears in an edge inI, the score ofn is incremented; and (2) the score of a noden is increased by the cardinality of the dierence set (i.e., jj). For any XPath selector that was added as a result of the third heuristic, its suspiciousness score is incremented by the number of times it is added to the list. Once the suspiciousness scores have been assigned, the approach sorts I 0 in order from highest score to lowest score and reports this list to the developer. This list represents a ranking of the elements determined to be the most likely to have caused the detected IPFs. 3.2 Evaluation To assess the eectiveness of my approach for detecting and localizing IPFs, I implemented it as a prototype tool to measure its accuracy, quality of the localization results, and time needed for it to run. Specically, I addressed the following research questions: RQ1: What is the best strategy for comparing LGs of web pages having dierent script direction? RQ2: What is the eect of the heuristics used by my technique in improving its detection accuracy? RQ3: What is the accuracy of my technique in detecting IPFs in comparison with other tech- niques? RQ4: What is the quality of the localization results provided by my technique in comparison with other techniques? RQ5: How fast is my technique in detecting and localizing IPFs in comparison with other tech- niques? 28 To address these questions, I carried out an empirical evaluation of my approach on a set of real-world web applications and compared my results with three well-known approaches for detecting dierent types of presentation failures in web applications, WebSee [42], X-PERT [25], and Fighting Layout Bugs (FLB) [59]. In the following sections, I discuss the implementation of my approach, the subject applications, and the experiments for each of the research questions. 3.2.1 Implementation I implemented my approach as a Java prototype tool, GWALI (Global Web Applications' Layout Inspector). I used several third party libraries to implement some of the functionality required by my approach including the Java Spatial Index library to nd the k-Nearest Neighbors of the LG's nodes. Selenium Firefox Webdriver with Firefox version 47.1 was used to load the webpage and execute Javascript to compute the MBRs of text and tag elements in the web page. I ran my experiments on a 64-bit Linux Ubuntu 14.04 machine with a screen resolution of 1920 1080, 32 GB memory, and an Intel Core i7-4790 CPU. Three of the RQs (3{5) compare GWALI against three well-known techniques for detecting presentation failures in web applications. Each of them required minor optimizations, which I describe below, to improve their eectiveness and applicability for IPF. The rst technique, WebSee [42], is a tool designed to nd presentation failures in web applications using computer vision techniques. When running WebSee, I used WebSee's exclusion region feature to exclude areas that contained text from the comparison since WebSee will report a failure once it nds the text in the PUT to be dierent from the text in the baseline. This is because WebSee relies on screenshot image comparison, and a dierence in the pixels caused by changes in the characters of the text after translation of the text in the PUT will be reported as a failure. The second technique, FLB [59], detects common layout bugs in the page. The tool provides detectors for dierent kinds of problems. Some of these detectors, such as detecting images with invalid URLs or detecting text that is not readable due to low contrast with its background, are not related to IPFs. Their use would only add false positive detections to the output of FLB. Therefore, I only enabled the detector that could detect internationalization related problems, which is detecting text that is near or overlapping with horizontal or vertical edges. The third technique, X-PERT [25], detects Cross-Browser Issues (XBIs). X-PERT works by loading a page into two dierent browsers and comparing their rendering. I modied the technique so that I could perform this comparison between two translated pages in two windows of the same browser. X-PERT provides 29 three modules to perform the comparison. Only the structural XBI module was enabled in my runs of X-PERT, since the remaining modules (i.e., behavioral XBI and content XBI) were not relevant for detecting IPF and would only cause false positive detections. 3.2.2 Subject Applications The subject pool contained 70 dierent web applications. A complete list is available from the project's website [11]. To ensure diversity in the subject pool, I selected internationalized web applications that were: (1) visited frequently; (2) used dierent translation technologies; (3) translated into dierent languages; and (4) designed with dierent layouts and styles. I identied potential subjects from three dierent sources. The rst of these was builtwith.com, which is a website that indexes web applications built using various technologies. This site helped me to easily locate websites that were built using dierent kinds of translation frameworks, and ensure that my subjects covered a wide range of translation technologies. These frameworks include automatic translation frameworks, such as the Google Website Translator, and resource les based frameworks, such as MotionPoint. The second source was the Alexa top 100 most visited web sites list. The third and nal source was high-prole websites that targeted international audiences, such as travel-related and telecom company websites and were expected to provide their content in multiple languages. For each of these three sources, I manually inspected the identied websites to nd web pages that contained an IPF. I found 36 such pages and used these as the ground truth for true positive IPFs. For true negative IPFs, I selected 34 pages that were internationalized but that did not contain an IPF. Among all the selected subjects, 16 were translated to a language with a RTL script. For each subject application, I obtained a baseline (correct rendering) page and a PUT in another language that contained either zero faults (i.e., a true negative) or at least one IPF (i.e., a true positive). Overall, the faulty web sites identied in this process contained a wide range of IPFs, whose impact ranged from misaligned text to a complete distortion of the page that left it impossible to read. Figure 3.7 shows an example of the former and Figure 3.10 shows an example of the latter. To ensure my experiments were repeatable, I used the Scrapbook Firefox plugin to download local copies of the subject webpages along with all of the corresponding images and styles that were required for the pages to be rendered correctly. This enabled my experiments to avoid the risk that the page would be changed by its developers during my evaluation. In this process, I found that some subjects used JavaScript to dynamically change content (e.g., rotating main 30 news items) in the DOM, which made it dicult to compare the two versions of the page unless their JavaScript events were synchronized. To avoid this, I disabled JavaScript in the pages after they were rendered in the browser. 3.2.3 Experiments and Results To answer RQ1, I compared the detection precision and recall for the dierent strategies for comparing the LGs of the baseline and the PUT (see Section 3.1.3.5). These strategies are: (1) Comparing the LGs without any mirroring (no-mirroring). (2) Comparing the LGs with fully mirroring the edges in the PUT LG (full-mirroring). (3) Comparing the LGs with mirroring the edges in the PUT LG, and excluding the edges that are not mirrored in the page from the mirroring process (partial-mirroring). To compute the detection precision and recall for each strategy, I ran them on the subject applications that are written in RTL languages. I evaluated each strategy by comparing the output they produced with the ground truth. I considered the strategies to have a successful detection if they indicated there was a failure of any type in the PUT, regardless of whether the output contained the faulty element. Then I calculated the precision and recall of the strategies' ability to correctly detect if there was a failure. Figure 3.8 shows the precision and recall of the three strategies. 50 60 86 100 100 100 0 10 20 30 40 50 60 70 80 90 100 no-mirroring full-mirroring partial-mirroring Precision Recall Figure 3.8: Detection accuracy of the three dierent strategies to handle RTL language pages 31 Table 3.3: Results of IPF detection when removing the heuristics used by GWALI Neighborhood Alignment Direction Precision Recall Filter Filter 3 3 3 92 100 3 3 7 58 100 3 7 3 63 100 3 7 7 55 100 7 3 3 75 100 7 3 7 52 100 7 7 3 56 100 7 7 7 52 100 The results in Figure 3.8 show that GWALI with partial mirroring strategy could detect IPFs in RTL language pages with 86% precision and 100% recall, which was much higher than the other strategies. The reason behind this is that the RTL language subjects contained pages that are rendered in dierent ways. Some of the subjects are fully mirrored (i.e., rendered from right to left), and other are partially mirrored, and there are also subjects that are not mirrored at all. The partial-mirroring strategy can handle all of these cases, while the other strategies can only handle some of them. To answer RQ2, I evaluated the impact of each of the three heuristics described in my approach: (1) Limiting graph comparison to neighborhood subgraphs (Section 3.1.3) (2) Applying the alignment lter (Section 3.1.4) (3) Applying the direction lter (Section 3.1.4). To do this, I measured the precision and recall of GWALI's detection for all the subject applications with and without applying each of these heuristics. Table 3.3 shows the results of my experiments. To isolate the eect of each heuristic applied by my approach, I show in each row the results for a given conguration. The rst three columns in the table indicate whether the heuristic is applied or not. The last two columns shows the precision and recall for the given conguration. As can be seen from the table, using the heuristics dened in my approach is important to reduce the number of false positives when detecting IPFs. Disabling all the heuristics will cause the precision of GWALI to drop from 92% to 52%. To answer RQ3, I measured the precision and recall of GWALI's detection for the subject applications. I used the results of the best conguration of GWALI obtained from RQ1 and RQ2 and compared them against WebSee, X-PERT, and FLB. 32 WebSee and X-PERT work under the assumption that the layout of the baseline and the PUT are matching, so pages that are written in a RTL language and have mirrored layout can not be handled by these approaches. To mitigate the in uence of RTL pages on detection accuracy of WebSee and X-PERT, I show the results of the experiment twice. First, I show the results when including only subjects that are written in a left-to-right language. Second, I show the results when including all subjects. Figure 3.9 shows the precision and recall of GWALI compared to WebSee, X-PERT, and FLB. 94 56 56 74 92 51 52 70 100 100 100 57 100 100 100 58 0 10 20 30 40 50 60 70 80 90 100 GWALI WebSee X-PERT FLB GWALI WebSee X-PERT FLB Left-to-Right Subjects All Subjects Precision Recall Figure 3.9: Detection accuracy of GWALI compared to other approaches For precision, the results in Figure 3.9 show that GWALI could detect IPFs with 92% precision when tested on all the subjects, which was much higher than the other approaches. I analyzed the results to understand the reasons for the false positives reported by each of the approaches. For GWALI, I found that there were two general causes of false positives. The rst cause was when part of a web page's content was displayed in multiple layers using the z-index CSS property. The LGs generated by GWALI model the layout of the page in two dimensional space, eectively merging dierent layers to one. In some cases, web designers use the z-index CSS property in designing sliding content areas. In these cases, the content of every slide has a dierent z-index value. My model does not take that into account and merges the slides into one layer, which caused false positives. The second cause of false positives was when the change of a direction relationship was large (i.e., greater than as described in Section 3.1.4), but this change did not 33 distort the layout of page. The = 60 value is a heuristic, and in some cases, such as when a large amount of text shrinks, this can make the change exceed this value. For FLB, the main cause of false positives was the incorrect identication of the horizontal and vertical edges. This was due to inaccuracies in the image processing techniques used to identify the edges. For both WebSee and X-PERT, the underlying root cause of false positives was that both approaches detected any dierence with the oracle as a failure. This included any slight shifting of elements due to expanding or contracting text size. It also includes any mirroring for elements in pages written in a language having RTL script. For recall, the results in Figure 3.9 show that GWALI, WebSee, and X-PERT had 100% recall when tested on all the subjects. I investigated the results of FLB in order to better understand the reason for its false negatives. I found that this was due to the fact that the detectors in FLB could only detect a subset of the types of visible failures that could be associated with an IPF. In particular, FLB can only detect IPFs where the text is overlapping with an edge of a container or an image. Since not all IPFs had overlapping text, many IPFs were missed by the FLB approach. Figure 3.10: Example of an IPF where text became overlapping with a button after translation Overall these results show that GWALI is very accurate in detecting IPFs. Although other approaches had the same level of recall, GWALI had much higher precision. It is important to note that these results do not indicate that the other techniques are, in general, inaccurate for their intended purpose. Only that their detection mechanisms are not as eective as GWALI's when trying to detect IPFs. To answer RQ4, I calculated the number of elements that a developer would be expected to examine when using the output of the evaluated approaches. To determine this, I rst ran all of the approaches, except for FLB, on the 36 subjects that contained one or more IPFs. I did not evaluate the localization for FLB because it provides its output as a marked up screenshot of the 34 page, with areas of the observed failures highlighted. It was not clear how this result could be quantied and compared with the other approaches. Both GWALI and WebSee report a ranked list of potentially faulty elements, which I used as a proxy measure for expected eort. Although an imperfect metric, rank is widely used and allows me to quantify and compare results without the expense of a eld study with real developers. For subjects with only a single fault, I simply reported the rank of the faulty element after running the tool. For subjects with multiple faults, I calculated the rank of each fault using a methodology proposed by Jones and colleagues [33]. The general idea of this methodology is to report the rank of the rst faulty element that appears in the result set, simulate the x of that fault, and then rerun the analysis to get the ranking of the next highest fault. This is repeated until all faults are \xed." The general intuition of the motivation behind using this methodology is that it approximates the work ow of a developer who scans the results, xes a fault, and then reruns the analysis to see if any more faults remain. Calculating expected eort for X-PERT is a little more complicated because it returns an unordered set. If the faulty element is present in this set, then its location follows a uniformly random distribution. In the case of a single fault, the expected eort is therefore (n + 1)=2, where n is the size of the set, since the developer would, on average, have to examine half the elements in the set before nding the faulty element. Calculating this metric for the case when there are multiple faults generalizes to the problem of calculating the average number of comparisons for a linear search for k number of items in an unordered set of size n where the distribution of the k items is uniformly random. Equation (3.1) shows the standard equation for calculating this metric. n + 1 k + 1 (3.1) An additional complication of calculating the expected number of comparisons for X-PERT is that its output may not contain the faulty element. In this case, I need to approximate the amount of additional eort that would be required to nd the element in the remaining set of elements contained in the page. The challenge in this approximation is knowing exactly how developers will search the remaining elements. Instead of directly addressing this question, I calculate an upper and lower bound on the best and worst case scenarios. The best case scenario is that the developer nds the fault as they examine the rst element in the remaining set of 35 elements. The expected eort for this would then be the size of the set returned by X-PERT plus one. In the worst case, the developer performs a linear search of the remaining elements. To calculate this, I reuse Equation (3.1) to determine how many checks would be needed for this scenario. Equation (3.2) shows, in summary, the formula for the expected number of checks for the scenario where the faulty element has not been found in the output set returned by X-PERT. In this equation, n is the size of the result set returned by X-PERT, m is the size of the PUT, and k is the number of faults. 8 > < > : n + 1 best case n + mn+1 k+1 avg case (3.2) The results are as follows. For GWALI, the median number of expected comparisons was two; for WebSee, the median was 274; and for X-PERT, the median calculated with the best case assumption was 38 and with the worst case assumption was 233. It is important to note that these numbers represent a lower bound on the expected number of comparisons because I only calculated these numbers for the subject applications that actually contained IPFs. If I were to compute this number over all subjects that were reported as having IPFs, then the numbers for all subjects would increase. If I assume a linear search for cases where there would be no faulty element, then the median numbers for WebSee and X-PERT would rise signicantly because they had a lower precision for detecting IPFs. I further analyzed the ranking results of GWALI by compiling them into a histogram, which is shown in Figure 3.11. This diagram shows that over 80% of the correct faulty elements were ranked in the top six returned results. I consider this to be a very strong result. Using a tool, such as Firebug [8] , it would take a developer only a few minutes to use the reported XPath IDs to inspect the rendering of six elements. Overall, I consider these results to be a strong indication that GWALI can accurately localize the faulty element. As with RQ3, I want to emphasize that the results do not show that X- PERT and WebSee are not accurate techniques. Instead, I interpret the results as showing that the localization techniques dened in those approaches are not appropriate for IPFs and that the techniques proposed in my approach are both necessary and more accurate for the IPF localization problem. To answer RQ5, I measured the running time of GWALI, WebSee, X-PERT, and FLB on the subject applications. For each tool, the total running time included the total time required 36 Rank Frequency 0 10 20 30 40 0 5 10 15 20 25 30 35 Figure 3.11: Histogram of the ranks reported by GWALI to start the tool from loading the browser until the tool shut down and produced its output. For GWALI, I also measured the time required for each part of the approach. For each tool, Table 3.4 shows the average, minimum, and maximum measured running time. As the results show, GWALI needed 8.22 seconds on average to run, which is slightly faster than X-PERT (8.58 seconds) and FLB (8.89 seconds). WebSee was signicantly slower than the other approaches because its ranking heuristics used sub-image comparison, an expensive image processing technique. I further analyzed the time required for the dierent operations carried out by GWALI. The average distribution is shown in Figure 3.12. As can be seen from the graph, the most expensive part of the approach is interacting with the browser to collect MBRs and to load the page in the browser. This represents the cost to load and analyze the baseline page and the PUT. In a real-world scenario, I could expect half of this cost (the part associated with the baseline page loading and analysis) to be amortized over the total number of PUTs analyzed (i.e., the number of translated pages compared against.) Also, note that 73% of GWALI's running time (6 seconds) was consumed by the process of loading the browser. This means that the performance of my approach can be signicantly improved when the testers use it to check a large number of test 37 Table 3.4: Execution time of GWALI (in seconds) compared to other approaches GWALI Websee X-PERT FLB Average 8.22 301.90 8.58 8.89 Minimum 6.12 62.01 5.85 5.74 Maximum 25.03 1253.19 17.04 23.84 cases. This improvement can be achieved by loading the browser once and running all the test cases one after another in the same browser window. The results for this RQ showed that my approach has a similar run time to FLB and X-PERT. These results are positive since GWALI's runtime was mostly consumed by loading the browser and the average time was under ten seconds, which in absolute terms is not a long time. loading browser 73.48% collecting MBRs 12.68% building graphs 6.83% detecting script direction 1.04% comparing graphs 5.96% Figure 3.12: Break down of the running time of GWALI 3.2.4 Threats to Validity There are threats to validity that I have tried to address during my evaluation. To address threats to external validity, I made sure that my subjects included a variety of dierent translation 38 technologies and dierent languages including languages written in a RTL script. I have also made the subjects publicly available via the project website [11] to enable independent inspection of the subjects. To address threats to internal validity, I checked the results manually to verify correctness. Another possible threat is that in the subject applications, new IPFs could appear or disappear if I perform the experiments on a dierent screen resolution. In general, these IPFs could be easily detected by systematically running GWALI at dierent screen resolutions, as is done in techniques that target responsive web design [61, 60]. This should not have an impact on the accuracy results because the detection and localization mechanism that would be used for each screen resolution is exactly similar to the mechanism I evaluated in this section. Another threat is my assumption of how the developers will use the result set produced by my tool. I used rank, which is a commonly used metric to measure the eectiveness of localization. When using rank, I assume that the developers will examine the rst ranked element, and check if it is faulty or not, if it is not faulty, then they will examine the second reported element, and so on, until they nd the actual faulty element. My claim of the eectiveness of the localization in the approach relies on this assumption. Also, in case there are multiple failures in the page, my assumption is that the developers use the process described in Section 3.2.3 until they nd the rst fault, once the rst fault is found, it will be xed rst, then they will rerun the tool again on the new version of the page, and they will repeat this process until all faults are xed or the result set contains only non-faulty elements. This is a commonly used assumption in fault localization to measure the eort needed to localize faults [33]. 3.3 Summary In this chapter, I introduced an automated novel approach that can detect and localize IPFs in web applications. I also dened a new data structure, the LG, which is a model that represents the layout of a webpage. My approach works by building LGs of the web pages and comparing these LGs to identify the failures. To evaluate my approach, I implemented it as a prototype tool, GWALI, and tested it on 70 subject application. The results of the evaluation show that my approach can detect IPFs with 92% precision and 100% recall. My approach can identify the faulty element with a median rank of two and with an average running time of 8.22 seconds per web page. These results are positive and show that my approach could help developers to detect 39 and localize IPFs in web applications. This demonstrates that I have successfully nished the rst step of my dissertation, which is an automated technique for detecting and localizing IPFs. Once an IPF is detected in a web page, the second step is to repair it. This motivates the next part of my dissertation, which is an automated technique that can eciently repair IPFs in web applications. 40 Chapter 4 Repairing Internationalization Presentation Failures The goal of the second part of my dissertation is to automatically repair IPFs that are detected in a translated web page. In this chapter I give a detailed description of my repair approach. I also provide an evaluation of the approach to demonstrate its eectiveness and eciency in repairing IPFs. When an IPF is detected, there are several strategies developers can use to repair the faulty HTML elements. One of these is to change the translation of the original text, so that the length of the translated text closely matches the original. However, this solution is not always applicable for two reasons. Firstly, the translation of the text is not always under the control of developers, having typically been outsourced to professional translators or an automatic translation service. Secondly, a translation that matches the original text length may not be available. Therefore a more suitable repair strategy is to adapt the layout of the internationalized page to accommodate the translation. To do this, developers need to identify the right sets of HTML elements and CSS properties among the potentially faulty elements, and then search for new, appropriate values for their CSS properties. Together, these new values represent a language specic CSS patch for the web page. To ensure that the patch is employed at runtime, developers use the CSS :lang() selector. This selector allows developers to specify alternative values for CSS properties based on the language in which the page is viewed. Although the second repair strategy is relatively straightforward to understand, complex in- teractions among HTML elements, CSS properties, and styling rules make it challenging to nd a patch that resolves all IPFs without introducing new layout problems or signicantly distorting the appearance of a web UI. Consider the example in Figure 4.1. After translation, the \rst name" placeholder expands into \nome di battesimo", which results in it being cut-o by the 41 Name: Email: , first name last name email address (a) Correct and untranslated web page (baseline) N Email: , nome di batte il casato indirizzo email ome: (b) Translated web page containing an IPF (\nome di bat- tesimo" placeholder is cut o) N Email: nome di battesimo il casato indirizzo email ome: , (c) Increasing the width of \nome di battesimo" input in- troduces another IPF N Email: nome di battesimo il casato indirizzo email ome: , (d) Increasing the width of the outer DIV container (sur- rounding the elements) returns the \il casato" input to its position, but now \il casato" input is not right aligned with \indirizzo email" input Figure 4.1: Example of an IPF in a web page translated from English to Italian and illustration of how repairing it can introduce other IPFs edge of the input eld. To resolve this IPF, the width of the \nome di battesimo" input eld can be increased. However, this pushes the \il casato" input eld to wrap to a new line, resulting in a new IPF (Figure 4.1c). To repair this new IPF, the width of the outer DIV container's width can be increased, but this causes the \indirizzo email" input eld to no longer be right-aligned with the \il casato" input eld (Figure 4.1d). This sequence of layout adjustments demonstrates how repairing IPFs is a challenging process, as repairing an IPF can introduce another one. IFix [40], an existing technique that targets IPFs, uses a search-based approach to repair faulty translated web pages. Although theIFix technique could resolve a large number of IPFs, it could also negatively impact the readability of the page by reducing the size of the translated text or reducing the page's attractiveness by allowing the repaired page to deviate a lot from 42 Baseline Webpage Page Under Test (PUT) Step1 Extract Visual Relationships Step3 Solve Constraints and Repair IPFs Step 2 Convert Relationships to Constraints List of IPFs Terminate ? Yes Repaired PUT No Exit Figure 4.2: Overview of my repair approach the intended layout of the original untranslated page. IFix can also be slow, requiring up to 19 minutes in some cases to nd a repair for a single web page. In this chapter, my goal is to address the limitations of theIFix technique by introducing a new approach for repairing IPFs in translated web pages. My approach is designed to produce repairs faster thanIFix and repairs that result in more attractive and readable repaired web pages. To do this, my approach models the layout of a web page as a system of constraints and leverages the ability of state-of-the-art constraint solvers to quickly identify solutions that could be used as web page repairs. This chapter is organized as follows. In Section 4.1, I describe my repair approach in detail. Then in Section 4.2, I give a detailed evaluation of the approach. Finally, I give a summary of my work in Section 4.3. 4.1 Approach As stated earlier, the goal of my approach is to produce a repair that will resolve IPFs in a translated web page in a way that is faster and results in more attractive and readable web pages thanIFix. As withIFix, my approach's basic premise is that IPFs in a page can be resolved by changing the values of elements' layout CSS properties so that the elements' size after their texts' translation can be accommodated. However, to make my approach faster, I dene the repair problem as a constraint system whose solution represents new and correct values for the relevant CSS properties. To make the pages more attractive and readable, I introduce constraints so that the solution will more closely conform to the design and layout of the original, correct web page. Figure 4.2 shows an overview of my approach. The inputs to the approach are: (1) a baseline web page, which is typically the original version of the page that is already known to be correct; (2) a PUT, which is an internationalized version of the baseline that exhibits one or more IPFs; and (3) a set of IPFs that have been detected using an IPF detection tool (e.g., GWALI [21]). An 43 IPF is represented as a tuple of the formhe 1 ;e 2 ;ri wherer is the type of the IPF that aects the two elements e 1 and e 2 (e.g., e 1 and e 2 are not aligned). In the rst step, my approach analyzes the IPFs and the baseline to approximate a superset of the layout aspects that can be modied in order to resolve the IPFs. Then, in the second step, my approach converts these aspects of the layout into a system of constraints that can be solved and applied in the third step. My approach repeats these three steps until either all IPFs have been resolved or no further improvements are made to the PUT as a result of the previous iteration. 4.1.1 Step 1: Extracting visual relationships between elements The goal of the rst step is to identify the aspects of the PUT that must be modeled by constraints in order to resolve the detected IPFs. The aspects to be modeled include both the HTML elements whose CSS properties may need to be adjusted and the visual relationships (e.g., Right-Aligned) among those elements that must be either changed or not allowed to change to restore the correct layout. A naive solution to this problem would be to identify all visual relationships betweene 1 ande 2 as the aspects to be modeled. However, for all but the simplest of IPFs, this set is inadequate to fully address my goals of attractiveness and readability. To illustrate this, consider the case where the size of one element in a horizontal menu bar has grown and caused the menu bar to wrap to a second line. A simple solution that focuses only on the expanded menu item would be limited to changing the size of its text or its margins and padding. This could cause the item to look dierent than the rest of the menu items, hence, reducing the design's attractiveness and consistency. Instead, a better solution might be to increase the size of the containing parent element to allow more space for the expanded menu item and adjust surrounding elements to maintain any visual relationships with the menu bar, such as alignment relationships. The analysis in the rst step of the approach identies all such additional elements and their visual relationships. To identify the set of HTML elements, E, whose CSS properties may need to be adjusted, my approach analyzes the baseline web page and the reported IPFs. For each reported IPF, the approach employs one of two strategies based on the IPF's type,r. The rst strategy is used if (1) r is a Directional issue | when one element is placed in a specic direction to another element in the baseline (e.g., West) but it is placed in a dierent direction in the PUT (e.g., North) or (2) if r is a Containment issue | when one element contains (i.e., bounds) another element in the baseline, but not in the PUT. To identify elements that need to be modeled to address Directional 44 and Containment issues, my approach nds the element l that is the LCA of e 1 and e 2 in the DOM tree of the baseline. Then my approach addsl and all of its successors toE. The reason for this selection of elements is that HTML elements are typically arranged using relative positioning, and modifying sibling or parent elements can in uence the position of the other elements. Adding l and all of its successors toE is a conservative heuristic to ensure that all parent and the sibling elements are modeled so that they could be modied to resolve the IPFs. The second strategy is employed if r is an Alignment issue | when the two elements were aligned with each other along some dimension in the baseline, but not in the PUT. This is the simple situation discussed earlier and the approach only needs to add the two elements e 1 ande 2 that became unaligned to E. The mapping from the set of IPF types, R, to each of these three issue types (Directional, Containment, and Alignment) will vary based on the information provided by the IPF detector. Creating this mapping is a one-time eort that must be expended for every unique detector used to provide the input list of IPFs. Once the elements inE have been identied, the approach identiesV :EE!P(S), where S is the set of visual relationships that my approach captures between elements. For each pair of elements,V represents the visual relationships among those elements that must be either changed or not allowed to change to repair the PUT. The visual relationships between elements in the baseline web page represent the correct layout that is needed in the PUT. To identify these relationships, my approach renders the baseline in a web browser and uses the browser's API to traverse the DOM tree of the rendered baseline web page. For each HTML element h in the DOM, my approach collects h's position and size. If h contains a text element t inside it (e.g., text between <p> tags or the placeholder text of an <input> element), then my approach also collects t's position and size. After collecting the positions and sizes of all the elements in the baseline, my approach computes the visual relationships between the elements in E. For each pair of elements e 1 ;e 2 2 E, my approach computes the visual relationship between them by comparing their coordinates. For example, if e 1 -bottom e 2 -top then the relationship set would include Top-Bottom. Similarly, if e 1 -left = e 2 -left then the set would include Left-Aligned. The other relationships are computed in an analogous manner. Table 4.1 shows the set of visual relationships S that my approach captures between the elements in E. The rst column shows a complete list of the visual relationships our approach computes between the elements. A pair of elements (e 1 ;e 2 ) could have one or more relationships 45 Table 4.1: Constraints that need to be enforced between the variables of two elements e 1 and e 2 in the PUT based on the visual relationships between these elements in the baseline web page Relationship Constraint(s) to enforce it in the PUT Left-Right e1-right+e1-margin-right e2-lefte2-margin-left Top-Bottom e1-bottom+e1-margin-bottom e2-tope2-margin-top Left-Aligned e1-left = e2-left Top-Aligned e1-top = e2-top Right-Aligned e1-right = e2-right Bottom-Aligned e1-bottom = e2-bottom Contains e1-left+e1-border-left+e1-padd-left e2-lefte2-left-margin e1-top+e1-border-top+e1-padd-top e2-tope2-top-margin e1-righte1-border-righte1-padd-right e2-right+e2-right-margin e1-bottome1-border-bottome1-padd-bottom e2-bottom+e2-bottom-margin Side-Contains e1-left+e1-border-left+e1-padd-left e2-lefte2-left-margin e1-righte1-border-righte1-padd-right e2-right+e2-right-margin Same e1-left+e1-border-left+e1-padd-left = e2-left+e2-border-left+e2-padd-left e1-top+e1-border-top+e1-padd-top = e2-top+e2-border-top+e2-padd-top e1-righte1-border-righte1-padd-right = e2-righte2-border-righte2-padd-right e1-bottome1-border-bottome1-padd-bottom = e2-bottome2-border-bottome2-padd-bottom between them. These relationships have three types. (1) Directional: left-right, which meanse 1 is on the left side ofe 2 , or top-bottom, which meanse 1 is on top ofe 2 . (2) Alignment: right-aligned, left-aligned, top-aligned, and bottom-aligned. They determine if e 1 and e 2 are aligned with each other along one (or more) sides. (3) Containment: Horizontal-Contains, if the left and right sides of e 1 bound the left and right sides of e 2 . Contains, if the left, right, top, and bottom sides of e 1 bound those of e 2 . Same, if both e 1 and e 2 have the same position and size. 4.1.2 Step 2: Converting visual relationships to constraints The goal of this step is to translate the visual relationships among the elements inE into a system of constraints. The variables in this system will represent the sizes and positions of the elements inE, and the constraints among these variables will represent their correct intended layout, which can be obtained by analyzing the baseline web page. This system of constraints will then be solved in the third step of my approach (Section 4.1.3) to generate a repair for the IPFs. My approach denes the system of constraints for the PUT layout as a Linear Programming (LP) problem. LP is a method to nd an optimal solution for a set of constraints given an objective function. The constraints in a linear program are dened as a set of equalities and inequalities over real variables. The objective function is a linear function that need to be maximized or minimized while satisfying the constraints that are set to the variables. More formally, a linear program consists of: 46 Content Padding-top Padding-bottom Padding-left Padding-right Border-top Border-bottom Border-left Border-right Margin-top Margin-bottom Margin-left Margin-right Left Right Top Bottom content-box width border-box width Figure 4.3: CSS Box Model • A set of real variables: e.g. x 1 , x 2 , x 3 . • A set of linear constraints over the variables: e.g. 3x 1 + 2x 2 4 , x 1 + 2x 3 8. • A linear function to be maximized or minimized: e.g. minimize: f(x 1 ;x 2 ;x 3 ) = 2x 1 x 2 +x 3 LP representation is a good t for my approach since the variables can represent the numeric values of layout related CSS properties and all of the visual relationships shown in Table 4.1 can be represented using linear constraints. Additionally, the use of LP allows my approach to specify an objective function so it can pick a preferred solution if multiple solutions satisfy the constraints. Unlike other methods to model constraints, such as constraint programming, LP can eciently nd an optimal solution in polynomial time. For my approach, I used LP to model the constraints that need to be satised to repair a faulty translated web page. The following is a detailed description of how the variables and constraints of the LP are dened in my approach. 47 4.1.2.1 Variables My approach denes the variables in the LP to be the CSS properties that can be set for each element in E to change the way it is sized or positioned in a page. Properties that can be set for this purpose and their relationship to each other are formally dened by the W3C's CSS Box Model [6]. Figure 4.3 shows the dierent properties and relationships dened by the CSS Box Model. The box model consists of four areas: (1) The content area, which represents the area where the actual content of the element appears. (2) The padding area, which is an empty transparent area around the content. (3) The border area, which surrounds the padding and the content areas. (4) The margin area, which is an empty transparent area outside the border. Each one of these four areas can be either dened for all four sides of the box or individual sides can be dened using the modiers \top," \bottom," \left," or \right." In total, this represents 16 variables that can be dened for each HTML tag element | four variables representing the positions of the four sides of the HTML element, and 12 variables representing the size of each side of the padding, border, and margin areas. By HTML denition, text elements are represented only by the positions of the four sides representing the content area of the text and they do not have attributes to represent their margin, padding, or border. 4.1.2.2 Constraints My approach uses the constraints in the LP to describe the correct visual relationships among the elements. The constraints are expressed as linear constraints comprised of the variables dened for each element. The approach generates two types of constraints: The rst type of constraint is derived directly from the visual relationships, V , identied in the previous step. These constraints restrict the variables in the LP to have values that place the elements in the PUT in a layout that resembles the baseline web page's layout. For each extracted relationship v2 V between two elements e 1 ;e 2 2 E, my approach adds constraints for the variables in the LP to enforce the same visual relationships on the PUT. The approach generates dierent types of constraints for the dierent types of visual relationships. Table 4.1 shows the mapping of visual relationship type (rst column) to the specic generated constraints (second column). To illustrate, consider the example in Figure 4.1. For the left-right visual relationship between rstname input (e 1 ) and lastname input (e 2 ) the following constraint is added to the LP: 48 N Email: nome di battesimo il casato indirizzo email ome: , (a) Solution that satises the constraints with minimum changes to the page N Email: nome di battesimo il casato indirizzo email ome: , (b) Solution that satises the constraints but introduce unnecessary additional changes Figure 4.4: Example of two solutions that satisfy the constraints and repair the IPF in the PUT. e 1 -right +e 1 -right-margine 2 -lefte 2 -left-margin For the right-aligned relationship between lastname input (e 2 ) and email input (e 3 ) the following constraint is added to the LP: e 2 -right =e 3 -right The second type of constraint preserves the size of text in the repaired PUT. I add this type of constraint to the system to prevent the approach from reducing the text size in the new layout, which could negatively impact readability of the repaired PUT. My approach does this indirectly by constraining the variables that represent the positions of the four sides of the content area of each text element to values that maintain the size of the text. For each text element t that has a width w and height h in the PUT, the approach adds a constraint to the LP t-right t-left =w and another constraint t-bottom t-top =h. For the running example, the following constraints will be added to the LP for the variables representing \nome di battesimo" placeholder,e p : e p -righte p -left = width ofe p (constant), and e p -bottome p -top = height of e p (constant). Where e p -left, e p -right, e p -top, and e p -bottom are the variables representing the four sides of the placeholder text box. 4.1.3 Step 3: Solving constraints and producing a repair The third and last step of my approach solves the constraint system and generates the repaired PUT. The solution to the constraints represents the new values for the content, padding, border, and margin of the elements in the PUT and together represent a correct layout of the PUT that resolves the IPFs. 49 Multiple solutions can satisfy the constraints generated by the approach. However, not all solutions are necessarily equivalent in their impact on the aesthetics of the web page. For example, Figure 4.4 shows two solutions for the running example. The rst solution repairs the PUT by increasing the width of the input elds and the outer container to a value that is \just enough" to resolve the IPF. The second solution also repairs the PUT, but has unnecessary additional increases in the width of the input elds and the outer container. I hypothesize that repairs that make minimum changes to a web page have better aesthetic quality since they do not deviate as much from the original design of the web page. The results of the evaluation (Section 4.2) support this hypothesis. My approach tries to maximize the aesthetic quality of the generated repair by favoring solutions that yield the least modications to the layout of the repaired page. In other words, the approach favors solutions that resolve IPFs with the minimum change possible. Linear programming allows my approach to decide which solutions should be favored by den- ing an objective function over the constraints. In my approach, I dene the objective function to favor new content sizes, margins, paddings, and borders that are as close as possible to the original values. More formally, the objective function is: minimize m X i=1 c i jvar i org i j Where var 1 ;var 2 ;:::;var m are all the variables dened in the LP and org 1 ;org 2 ;:::;org m are the original values for these variables obtained from the unxed PUT. c 1 ;c 2 ;:::;c m are constant coecients that are set depending on the type of the variable. My approach sets c i to high values for margins, paddings, and border variables to make changing them less desirable than changing the variables that control the content size or the positions of the elements. My approach penalizes margins, paddings, and border changes more because they typically have small values, and a minor change in their values has a disproportionately large impact on a page's layout. The objective function dened above is non-linear, which requires me to linearize it to use it in the LP. A well-known linearization technique for such an objective function is described in linear programming literature [29]. The basic idea of this linearization technique is that if we have an LP in this form: minimize c 1 jvar 1 org 1 j +c 2 jvar 2 org 2 j such that: var 1 >var 2 It is equivalent to: 50 minimize c 1 t 1 +c 1 t 2 such that: var 1 org 1 t 1 var 1 org 1 t 1 var 2 org 2 t 2 var 2 org 2 t 2 var 1 >var 2 This linearization technique reformulates the objective function by introducing new variables, t i , to replace each absolute value termjvar i org i j, and adding into the LP two new constraints t i var i org i andt i var i org i for each newly introduced variable t i . After solving the LP, my approach modies the PUT by changing the content size, padding, margin, and border of the elements in the PUT using the values produced by the solver. The new width and new height of each element e in the PUT is computed based on the value of the element's CSS property box-sizing. This CSS property determines whether the padding and border are included when computing the width and height of the element. Figure 4.3 shows the dierence between the two box-sizing options. If the value of box-sizing is set to \border-box" the new width is e-righte-left, and the new height is e-bottome-top. If the value of box-sizing is set to \content- box" the new width is (e-righte-border-righte-padd-right)(e-left+e-border-left+e-padd-left), and the new height is (e-bottom e-border-bottom e-padd-bottom) (e-top + e-border-top + e-padd-top). In both cases, if the newly computed width value exceeds the value of the max-width CSS property, the value of max-width will be adjusted to the new width. A similar process will be applied for the min-width, max-height, and min-height CSS properties. The padding, border, and margin of each element e in the PUT is updated with the new values produced by the solver for the variables related to padding, border, and margin that are dened for the element e in the LP. My approach applies the new values for the CSS properties using a language-specic CSS patch for the web page. The patch is embedded in a CSS :lang() selector. This selector allows my approach to specify new alternative values for CSS properties based on the language in which the page is viewed. For the running example, Figure 4.4a shows the repaired PUT produced by my approach. To satisfy the \contains" relationship constraint between the rstname input and its placeholder, the approach will produce new values for the two variables that represent the width of the rstname 51 input e 1 (i.e., e 1 -right and e 1 -left). The new values for e 1 -right and e 1 -left have to satisfy the containment constraint with the translated placeholder, which will require the value of e 1 -right to increase because the right side of the placeholder has a value that is larger than the e 1 -right. This increase in the value of e 1 -right will result in increasing the width of the rstname input to a new value that makes it contain the placeholder. Similarly, to satisfy the right-alignment constraint between the lastname input e2 and email input e3, the approach will also nd a new value for e 3 -right that makes it equal to e 2 -right. This new value will require an increase to the email input width to make it right aligned with the new position of lastname input. A similar process is applied to resolve all the other IPFs that appear in the PUT until they are all resolved. 4.2 Evaluation I conducted an empirical evaluation that focused on the following research questions: RQ1: How eective is my approach in repairing IPFs? RQ2: How long does it take for my approach to generate repairs? RQ3: What is the quality of the repairs generated by my approach? 4.2.1 Implementation My approach is implemented as a Java prototype tool,CBRepair (Constraints-Based Repair), which leveraged several third-party libraries. Selenium Webdriver was used to automatically load the web pages and extract the rendered DOM information. I used the Java interface of Google Optimization Tool's (OR-Tools) linear programming solver to solve the constraints as discussed in Section 4.1.3 [9]. Javascript code was executed using Selenium Webdriver to retrieve the Box Model and XPath locator for each element as well as apply the repair patch to the PUT. To maintain consistency, all web pages were rendered with Firefox version 46.0.1 maximized to a xed screen resolution of 1920 1080 pixels. All experiments were conducted on a single Intel Core i7-4790 64-bit machine with 32GB memory, running Ubuntu Linux 16.04 LTS. 4.2.2 Subjects I conducted the evaluation on a set of 23 real-world subject web pages, which are shown in Table 4.2. The column \#HTML" shows the total number of HTML elements in the subject 52 Table 4.2: Subjects used to evaluateCBRepair ID Name URL Size Baseline Translated 1 akamai https://www.akamai.com 304 English Spanish 2 caLottery http://www.calottery.com 777 English Spanish 3 designSponge http://www.designsponge.com 1,184 English Spanish 4 dmv https://www.dmv.ca.gov 638 English Spanish 5 doctor https://sfplasticsurgeon.com 689 English Spanish 6 els https://www.els.edu 483 English Portuguese 7 facebookLogin https://www.facebook.com 478 English Bulgarian 8 ynas http://www.flynas.com 1,069 English Turkish 9 googleEarth https://www.google.com/earth 323 Italian Russian 10 googleLogin https://accounts.google.com 175 English Greek 11 hightail https://tinyurl.com/y9tpmro7 1,135 English German 12 hotwire https://www.hotwire.com 583 English Spanish 13 ixigo https://www.ixigo.com/flights 1,384 English Italian 14 linkedin https://www.linkedin.com 586 English Spanish 15 mplay http://www.myplay.com 3,223 English Spanish 16 museum https://www.amnh.org 585 English French 17 qualitrol http://www.qualitrolcorp.com 401 English Russian 18 rentalCars http://www.rentalcars.com 1,011 English German 19 skype https://tinyurl.com/ycuxxhso 495 English French 20 skyScanner https://www.skyscanner.com 388 French Malay 21 twitterHelp https://support.twitter.com 327 English Russian 22 westin https://tinyurl.com/ycq4o8ar 815 English Spanish 23 worldsBest http://www.theworlds50best.com 581 English German page, giving a rough estimate of its size and complexity. The column \Baseline" shows the language of the subject used in the baseline version that shows the correct appearance of the page, and \Translated" shows the language that exhibits IPFs in the subject with respect to the baseline. The subjects included web pages containing one IPF and web pages with multiple IPFs occurring together. These subjects are the same subjects that were used in the evaluation ofIFix [40] and GWALI [21]. These particular subjects where chosen to cover a wide range of translation technologies and frameworks with diversity in size, layouts, and styles. The original sources of these subjects are: (1) builtwith.com, a website that indexes web applications built using various technologies, (2) Alexa top 100, and (3) popular, high-prole websites that targeted international audiences with multiple languages. I used the same set of subjects to facilitate a more direct comparison between the two approaches. The complete set of subjects' les can be found on the project website [4]. 53 Table 4.3: Results forCBRepair eectiveness and eciency RQ1 RQ2 Subject #Before #IFix #CBR IFix CBR akamai 6 0 4 (FP) 153 1.57 99% caLottery 4 0 4 106 8.54 92% designSponge 9 1 0 1184 9.47 99% dmv 13 0 13 408 4.07 99% doctor 21 0 0 99 2.87 97% els 6 0 0 125 3.75 97% facebookLogin 16 0 14 (FP) 417 5.48 99% ynas 9 0 9 379 64.42 83% googleEarth 15 0 4 (FP) 163 2.39 99% googleLogin 6 0 0 92 1.45 98% hightail 2 0 0 95 13.67 86% hotwire 30 0 30 353 4.66 99% ixigo 38 12 32 725 10.60 99% linkedin 22 0 6 (FP) 130 2.28 98% mplay 76 0.53 44 1180 114.54 90% museum 32 0.13 19 415 10.19 98% qualitrol 19 0 7 (FP) 123 2.02 98% rentalCars 6 0 2 297 9.27 97% skype 3 0 0 127 1.81 99% skyScanner 4 0 0 96 3.12 97% twitterHelp 5 0 0 87 1.65 98% westin 11 1 11 167 9.22 94% worldsBest 24 0 19 326 5.56 98% 4.2.3 Experiment One To answer RQ1 and RQ2, I evaluated and compared the repairs performed byCBRepair and IFix. For RQ1, I measured the number of IPFs detected by GWALI before and after the repair. The experiment was carried out in three steps: (1) GWALI was run on each subject web page's translated version (i.e. the PUT) to determine the number of IPFs present before the repairs. (2)CBRepair andIFix were independently run on each subject web page to produce a repaired version (PUT 0 ). (3) GWALI was run on each subject web page's PUT 0 to determine the number of unresolved IPFs remaining in each page. SinceIFix uses a non-deterministic search-based technique, I ranIFix on each subject 30 times, averaging the number of resulting IPFs. For RQ2, I evaluated my approach by measuring and comparing the running times forCBRepair andIFix to generate the repairs. To mitigate the non-determinism ofIFix, each subject'sIFix running time was also computed as an average across the 30 runs. 54 4.2.3.1 Presentation of Results Table 4.3 shows the results of Experiment 1. The initial number of IPFs for each subject are shown under the column \#Before". Under RQ1, the columns \#CBR" and \#IFix" correspond to the number of IPFs remaining after applyingCBRepair andIFix respectively (Note that values for IFix are averages of 30 runs and may contain decimal values). Under RQ2, column \CBR" shows the running time of each subject (in seconds) forCBRepair. Column \IFix" shows the average running time (in seconds) of each subject across the 30 runs ofIFix. The column \" shows the percentage reduction when comparingCBRepair's running time andIFix's average running time. The running time ofCBRepair averaged 13 seconds with a median of 4.7 seconds, and the running time ofIFix averaged 315 seconds with a median of 163 seconds. 4.2.3.2 Discussion of Results Overall, the results of experiment one show thatCBRepair was able to signicantly reduce the number of IPFs in the subject applications. For 19 of the 23 subjects,CBRepair was able to decrease the number of IPFs detected in the after version of the page, with an overall average re- duction across all subjects of 54%. Of those 19 improved subjects, eight were completely repaired, with GWALI reporting zero remaining IPFs. After visually inspecting the other 11 subjects, I determined that ve of them were completely repaired byCBRepair, but GWALI is erroneously reporting them as still containing IPFs (i.e., GWALI reported ve false positives. I marked them with \FP" in Table 4.3.) This brought the average IPF reduction ofCBRepair up to 65%. I investigated the IPFs that my approach could not repair and found two scenarios where this occurred. The rst, which occurred in one subject, was that some constraint systems were infeasible and a satisfying solution could not be found by the solver. I conrmed this by manually inspecting the subject and found it impossible to repair the IPFs without reducing the font size. The second reason, for the remaining non-repaired subjects, was that these web pages used CSS properties that could not be accurately modeled by a linear constraint system. An example of this is the \ oat" CSS property. I believe that these types of properties could be modeled with more expressive constraint systems and intend to explore this in future work. Despite the successful reduction of IPFs by my approach,IFix was able to achieve a higher reduction. Overall,IFix had an average IPF reduction of 98% and could completely resolve IPFs in 18 of the 23 subjects. I analyzed the results to better understand the reasons whyIFix was able to outperform my approach in RQ1. My investigation found that my approach's restriction 55 on modifying the font size was a likely cause for this disparity |IFix reduced the font size in 21 of 23 subjects' repairs. To evaluate this possibility, I modiedIFix to prevent it from allowing repairs that reduced font size (I refer to this version asIFix 0 ). I found thatIFix 0 was able to resolve only four of the 23 subjects, with a 29% average reduction of IPFs. This indicates that IFix has a limited ability in repairing IPFs without modifying the font-size. As we show in RQ3, although these repairs allowedIFix to perform well in RQ1, the user-perceived quality of the IFix repairs was lower than those generated by my approach | a result that could re ect the impact of the font size reductions. For RQ2, results indicate thatCBRepair's analysis was signicantly faster thanIFix in gen- erating repairs. Breaking down the average time of my approach to the granularity of individual steps, 4.76 seconds (37%) was required to extract relationships (Section 4.1.1), 0.11 seconds (1%) to extract constraints (Section 4.1.2), and 7.86 seconds (62%) to solve and repair (Section 4.1.3). CBRepair had an average repair time of less than 13 seconds versus an average repair time of over 5 minutes forIFix, an average time reduction of 96%. 4.2.4 Experiment Two To answer RQ3, I conducted user surveys to measure the visual quality of the generated repairs from a human perspective. In the surveys, we asked users to compare a series of two side-by-side UI snippets against snippets from the baseline page. A UI snippet is an image of the area where an IPF occurred and was obtained by cropping the subject web pages' screenshot. There were two variants of the side-by-side snippets. The rst compared snippets from the PUT against the version repaired byCBRepair, and the second compared snippets from the version repaired by CBRepair against snippets from the version repaired byIFix. Within each variant, the order in which the snippets were displayed was randomized and only labeled Version 1 and Version 2. Examples of these snippets are shown in Figure 4.5. I only compared the snippets for the subjects whereCBRepair was able to reduce the number of IPFs, resulting in a total of 24 evaluated IPFs. Participants were asked to rate the two UI snippets based on three metrics: (1) attractiveness, (2) readability, and (3) similarity to the baseline page. The ratings were based on a numeric scale from 1 to 10, where 1 represents the least-attractive/least-readable/least-similar and 10 represents the most-attractive/most-readable/most-similar. To conduct the surveys about the repairs, I used the Amazon Mechanical Turk (AMT) service. The participants of the study were anonymous to me; however, to ensure the quality of the 56 (a) PUT before repair (b) IFix repaired (c) CBRepair repaired Figure 4.5: Example of dierent repair versions of UI Snippets responses, I only allowed participants who had a minimum 95% approval rating on at least 100 previously completed tasks. I followed AMT best practices by employing a captcha and a check- question to reduce the likelihood that the survey was completed by a bot. Due to failing these checks, 35% of the completed surveys were removed from the analysis. For each survey, we had 20 unique workers participate, each was paid $0.10 for completing the survey. 4.2.4.1 Presentation of Results The results for the appearance, readability, and similarity ratings provided by the participants are shown in Figure 4.6. The box plot in Figure 4.6a shows the results when snippets ofCBRepair were compared with those of the PUT. The box plot in Figure 4.6b shows the results when snippets fromCBRepair were compared against those fromIFix. The boxes in the gures represent the distribution of the numeric ratings given by the participants with the average rating marked as an `x' and the median marked by a bar inside the box. As can be seen from Figure 4.6a, when compared to the PUT, the average attractiveness score increased from 6.3 to 7.1 (12%), the average readability score increased from 6.6 to 7.0 (7%), and the average baseline-similarity score increased from 6.3 to 7.4 (18%). These results were statistically signicant using the Wilcoxon signed-rank test with p-values of 3:948 e05 , 0:01654, and 3:68 e08 , respectively. The Wilcoxon signed-rank test was used for this analysis because I was comparing paired ratings from two dierent groups and the measured ratings were not normally distributed. 57 When compared withIFix, Figure 4.6b shows that the average attractiveness score increased from 6.3 to 6.8 (8%), the average readability score increased from 5.9 to 7.0 (19%), and the average baseline-similarity score increased from 6.8 to 7.0 (2%). Both attractiveness and readability were statistically signicant with p-values of 0:006957 and 1:358 e07 , but baseline-similarity was not. 4.2.4.2 Discussion of Results Overall the results of experiment two show that users perceived the quality of the CBRepair generated repairs as more attractive and readable than the PUT and repairs generated byIFix. Below we discuss the results for each of the three metrics in more detail. For readability, my approach signicantly outperformedIFix, with an average increase in score of 19%. The results versus the PUT were also improved, but by only 7%. We investigated the repairs in more detail to identify possible reasons for this improvement. To do this, I examined the results that showed the largest decreases in their readability rating and found that these generally correlated with the largest decreases in font size in the repairs generated byIFix. For example, in the most extreme case, the readability score dropped 68% and the font-size was reduced from 12px to 7px (a 40% reduction). In the second most extreme case, the readability score dropped 50%, and the font size was reduced from 12px to 8px (a 33% reduction). The idea that font size played a role in this score change is also supported by the comparison of my approach's results against those of the PUT, where the font size was the same. Here I saw a smaller improvement in the readability score, likely only re ecting the impact of the IPFs's layout distortion on the readability of the page. For attractiveness, the repairs generated by my approach were rated more attractive than both the PUT and the pages generated byIFix. While the improved attractiveness versus the PUT is not meaningful by itself, it represents a sanity check that the resulting repair is something that users would consider an improvement over the unrepaired web page. I investigated the repairs in more depth to understand the reasons behind my approach's higher ratings versusIFix. To do this, I again examined the results that showed the largest decreases in their attractiveness ratings and found that these also generally correlated with the largest decreases in font size in the repairs generated byIFix. For similarity to the baseline, the results were also positive. Users perceived my approach's repairs as leading to pages signicantly more similar to the baseline than the pages that contained the IPFs (i.e., the PUTs). The results also showed that there was no signicant dierence between 58 CBRepair PUT (a) CBRepair versus PUT CBRepair IFix (b) CBRepair versus IFix Figure 4.6: Ratings given by user study participants to assess repair quality 59 the similarity scores awarded toIFix andCBRepair. This is a reasonable result and shows that both approaches are able to produce pages that look similar to the original baseline page. Since my results analysis indicated that font size was a likely signicant factor aecting at- tractiveness and readability, we investigated this issue further. I repeated experiment two but compared the repairs generated by the font-size restrictedIFix (IFix 0 ) with the repairs generated byCBRepair. The results showed that the user perceived quality forIFix 0 increased signicantly and there was no statistically signicant dierence between the attractiveness and readability scores ofIFix 0 andCBRepair. These results suggest that my decision to prevent changes to font size helped my approach to generate repairs that were more attractive and readable than those generated byIFix. However, it is important to note that with this restrictionIFix was only able to repair four of the 23 subjects. So incorporating this restriction intoIFix would then involve other design tradeos and adaptations to make it as eective as my approach. Overall, results from RQ3 strongly indicate that my approach, CBRepair, can repair IPFs with improvements in attractiveness and readability over the repairs generated byIFix while maintaining the intended baseline look. 4.2.5 Threats to Validity A potential threat to external validity is that my approach was only applied to web applications chosen by GWALI andIFix. However, the subjects were originally chosen to represent a variety of dierent translation technologies and layout styles. This helps to ensure that the results can be generalized to a wider selection of web pages. A threat to construct validity is that the number of IPFs (RQ1) is based on an automated tool. However, GWALI is currently the only available automated tool that can detect and quantify IPFs. Furthermore, I included a user study in RQ3 to support the conclusion that the repairs represented an improvement to the pages. Another threat to construct validity is that human perception or judgment may be too sub- jective to correctly judge the repair quality. Some users may judge the quality based on the attractiveness, readability, or similarity to the baseline. I addressed this issue by using all three evaluation metrics to evaluate dierent aspects of web page's quality from the users' perspective. Another potential threat is that the participants on AMT may not understand the translated language to provide a valid interpretation for the \readability" metric. However, all survey ques- tions included two snippets, which allows the users' to give relative ratings to indicate if they can 60 easily recognize the dierent characters in the text without the need to actually understand the meaning of the text. 4.3 Summary Existing techniques to repair IPFs in web applications are slow, and often negatively impact the readability and the attractiveness of the web page. In this chapter, I introduced a new approach to overcome the limitations of existing IPF repair techniques. My approach models the layout of a web page as a system of constraints. Then it uses constraint solvers to quickly nd solutions that could be used to repair the web page. The use of constraint solvers allows my approach to maintain the attractiveness of the original web page. The evaluation of my approach shows that it is signicantly faster than existing state-of-the-art technique and that the quality of my approach's repairs is substantially better in both attractiveness and readability. 61 Chapter 5 Related Work In this chapter, I present related work that focuses on dierent aspects of web GUI and interna- tionalization testing. I divide my related work into three parts: The rst part is the group of work related to detection of presentation failures (Section 5.1). The second part is the group of work related to repair of presentation failures (Section 5.2). The third and last part is the group of work that generally helps the developers in the internationalization process and is not necessarily related to detection or repair of presentation failures (Section 5.3). 5.1 Detection Techniques Dierent techniques [23, 22, 52] have been developed to perform automated checks for several common internationalization problems, such as corrupted text, inconsistent keyboard shortcuts, and incorrect/missing translations. These techniques are designed to run a list of pre-dened tests on the application and are not capable of detecting IPFs. A tool, provided by the World Wide Web Consortium (W3C) called i18n checker [16], when given a webpage, checks a group of internationalization related properties to make sure they are set properly. The technique scans the HTML and HTTP response header of the web page and tests them against a pre-dened set of internationalization related aspects. The i18n Checker only veries a webpage's syntactic compliance and cannot verify any properties related to the page's layout or appearance. For example, it checks if the character encoding is specied in the HTML document and the language attribute is added in the HTML tag. 62 Several detection approaches have been proposed in the eld of Cross-Browser Testing (XBT). WEBDIFF [26] compares the appearance of a web application in dierent web browsers and iden- ties dierences in the appearance as potential issues that need to be reported to the developers. Given a web page to be analyzed, WEBDIFF compares its appearance in dierent browsers by combining a structural analysis of the information in the page's DOM and a visual analysis of the page's appearance that is obtained through screenshots. Another approach X-PERT [25] is built on top of WEBDIFF. In addition to detecting layout related issues, X-PERT can also detect cross-browser behavioral issues. X-PERT uses crawling to navigate through the web applications and create navigation models. X-PERT then compares these navigational models to determine if the web application has any cross-browser behavioral issues. Another technique, BrowserBite [56], uses image processing and machine learning techniques to identify dierence in the rendering of a web page across dierent browsers. These approaches, and other cross-browser testing approaches, suer from over-sensitivity when used to detect IPFs. The reason for this is that in XBT the two versions of the page are expected to match exactly, and a small change in the page that would occur due to the translation of the text would be detected as a failure by these techniques. WebSee [42, 41, 43] and FieryEye [44] are techniques that utilize computer vision and image processing techniques to compare a browser rendered test webpage with an oracle image to nd visual dierences. The two techniques are useful in usage scenarios where the page under test is expected to match an oracle (e.g., mock-up driven development and regression debugging). WebSee and FieryEye suer from over-sensitivity when used to detect IPFs because they rely on image comparison. Techniques that rely on image comparison are not eective in detecting IPFs because the text in the translated web page uses a dierent alphabet and image comparison techniques will ag these changes in the alphabet as IPFs. Also, in internationalization, the translated web pages inherently have small dierences in the sizes of the elements from the baseline webpages that are caused by the minor changes in the text size after translation. Image comparison techniques would ag these small dierences as IPFs. Invariant specication techniques, such as Cornipickle [31], Cucumber [7], Crawljax [45], and Selenium WebDriver [15], allow developers to specify assertions on the page's HTML and CSS data. These assertions will then be checked against the web page's DOM. These techniques are eective in testing specic desirable properties in webpages, such as checking the visibility of an element after an event is executed. Theoretically, these techniques can be used to detect IPFs. 63 However, developers would need to manually write thousands of assertions to cover all the aspects of the desired layout for each webpage, which is not practical in real world scenarios. LayoutTest [14] is a software library developed by LinkedIn to test the layout of iOS applica- tions. The library allows developers to dene a data specication (dictionary), which is used to generate dierent combinations of testing data. Then the interface of the app is exercised with the generated data. The library automatically checks for overlapping elements in the UI and allows the developers to write assertions to verify that the layout and the content in the interface are correct. LayoutTest requires manual eort by the developers to write the assertions. As with invariant specication techniques, manually writing assertions to detect IPFs is not practical due to the huge number of assertions that need to be written for each webpage. ReDeCheck [61, 60] is a responsive web design testing tool. The tool models the layout of a responsive web page across multiple browser's viewport widths using a model called Responsive Layout Graph (RLG). When developers modify the CSS of a responsive web page, ReDeCheck builds a new RLG for the updated page and compares it with the original RLG in all the viewport widths to ensure that the modication did not result in layout failures. The RLG is a model that is similar to the alignment graph used in X-PERT, and thus suers from the same problem of over-sensitivity when used to detect IPFs. Although the tool is eective in detecting layout faults in responsive web design, it cannot be used to detect IPF due to its over-sensitivity. VFDetector [54] is similar to ReDeCheck as it detects layout faults that are caused by layout transformation in responsive web design. However, VFDetector diers from ReDeCheck by fo- cusing on a specic type of layout faults, which is when HTML elements become invisible due to layout transformations. VFDetector models both transformations that are caused by user inter- action with the web page and by changing the viewport width of the browser. VFDetector cannot be used to detect IPFs because it focuses on changes in the visibility of the elements, and in most IPFs the visibility of the faulty elements does not change in the translated web page. Fighting Layout Bugs (FLB) [59] allows developers to detect general correctness properties, such as text that is overlapping with horizontal or vertical edges or text with too low contrast with the background. A drawback of this approach is that it can only detect a limited set of application agnostic presentation failures. However, IPFs related to the direction and the alignment of the elements are application specic and require a reference web page to be detected. This is because it is not possible to determine the correct direction and alignment between elements without examining a reference web page. The limitation of FLB detection is conrmed by the evaluation 64 of my IPF detection work, many IPFs were missed by the FLB approach. Another drawback of FLB is that it only outputs a screenshot of the page with the failures marked on it, so the developer has to inspect the screenshot and manually search through the page to identify the faulty elements. Lastly, extensive work in the area of automated GUI testing [47, 46] focuses on testing the behavior of the system based on event sequences triggered from the user interfaces. These tech- niques dier from my approach in that they are not focused on testing the appearance of the GUI, but instead they use the GUI to test the behavior of applications. 5.2 Repair Techniques The most closely related technique to my repair approach is IFix [40]. It uses search-based technique to automatically repair IPFs in webpages. The technique works by exploring a large number of values for the CSS properties that could repair the faulty elements in a webpage. IFix suers from two main issues. First, it requires extensive time (up to 19 minutes) to nd a repair for a single webpage. Second, for many webpages, IFix generates limited repairs where the repaired version has a font-size reduced to a very small value, which aects the readability of the webpage. In fact, a user study in the evaluation of IFix shows that users did not favor 30% of the patches generated by IFix. The goal of my repair approach is to overcome these two issue by generating repairs eciently and without compromising the readability of the webpage. XFix [38, 39] and MFix [36] are two dierent approaches that use the same search-based frame- work as IFix to automatically repair dierent types of presentation problems in web applications. XFix focuses on Cross-Browser Testing (XBT) and MFix focuses on mobile-friendliness of web- pages. The tness function used for these two approaches targets the repair of fundamentally dierent types of presentation failures, and thus cannot be used to repair IPFs. Another class of techniques focuses on repairing dierent types of presentation failures in web applications. PhpRepair [55] and PhpSync [48], focus on repairing problems arising from malformed HTML. IPFs are, however, not caused by malformed HTML, meaning these techniques would not resolve IPFs. Another technique [64] assumes that an HTML/CSS x has been found for a web page and focuses on propagating the x to the server-side code using hybrid analysis. Cassius [51] and its extension VizAssert [50] provide an extensible framework for reasoning about web pages' layout. The framework can be used to repair faulty CSS in web applications by 65 assuming the availability of a set of faulty CSS values and page layout examples that the technique can use to synthesize a repair. The technique uses the CSS from the page layout examples as the oracle to identify the x values for the faulty CSS. In the IPF domain, however, the page's before and after translation share the same CSS le. Therefore these techniques are not applicable for repairing IPFs. Another group of techniques focuses on automated repair of software programs. For example, SPR [35] and GenProg [65] use search-based techniques for nding a repair for a failure, while FixWizard [49] and FlowFixer [66] use analytical approaches for nding repairs. However, these techniques are not capable of repairing IPFs in web applications because they are structured to work for general-purpose programming languages, such as Java and C. 5.3 Techniques for Supporting Internationalization and Web Design TranStrL [63, 62] is a technique that assists developers in internationalizing an existing application by isolating hard-coded strings into external resource les. This isolation makes it easier for the developers to internationalize their applications by translating the resource les without making modications to the code of the application. Although helpful in internationalization eorts, such technique cannot be used to detect or repair IPFs. Apple provides users of its Xcode IDE with the ability to test the layout of their iOS apps and the apps' ability to adapt to internationalized text [2]. This is performed using \pseudo- localization," which is a process that replaces all of the text in the application with other dummy text that has some problematic characteristics. The new text is typically longer and contains non-Latin characters. This helps in early detection of internationalization faults in the apps. The technique also tests right-to-left languages by changing the direction of the text. Manual eort from the developers is still needed though since they have to verify that elements in the GUI reposition and resize appropriately after the pseudo-localization and direction change. Finally, Responsive Web Design (RWD) and constraint-based designing approaches (e.g., [58, 24, 17, 3]) are eective in designing layouts that adapt to dierent screen sizes and their use can help reduce the appearance of IPFs. However, these techniques cannot guarantee that the resulting pages will be IPF free. Frameworks, such as Bootstrap, require developers to annotate elements with classes that have pre-dened responsive behaviors, while techniques, such as DECOR, require 66 developers to provide \user-specied design constraints". These specications are limited, which make these techniques unable to prevent all types of IPFs. Also, these specications are provided manually by developers, which makes specifying them time consuming and error-prone; it is easy for the developers to specify wrong annotations or constraints. Underscoring this point, in the evaluation of my repair technique, two of the experiment's subjects (doctor and twitterHelp) were designed using Bootstrap and contained IPFs that could be repaired by my approach. 67 Chapter 6 Conclusion The goal of my research is to aid the developers in detecting, localizing, and repairing Interna- tionalization Presentation Failures (IPFs) that are exhibited in web applications. The hypothesis of my dissertation is: Internationalization Presentation Failures can be detected and repaired with high eectiveness and eciency using approaches based on modeling the visual relationships among elements in the webpage. In my dissertation I conrmed this hypothesis by designing and evaluating two approaches for detecting and repairing IPFs in web applications. The rst approach I designed and developed was for detecting and localizing IPFs in web applications. The approach uses models of the visual relationships between elements to determine if there are IPFs in the translated web page. It also uses the same visual relationship models to localize the set of HTML elements that are causing the failures. I empirically evaluated the eec- tiveness and eciency of this approach and compared it against three existing related techniques. The results of the evaluation showed that the detection mechanisms that are employed by existing techniques were not eective when used to detect IPFs, as they reported large number of false positives (with precision ranging from 51% to 70%). On the other hand, my detection approach was able to detect IPFs with 92% precision and 100% recall. My approach could localize the faulty element that caused the IPF with a median rank of two and with an average running time of 8 seconds per web page. These results were positive and conrmed that my approach is highly eective and ecient in detecting and localizing IPFs. The second approach I designed and developed was for repairing web pages that exhibit IPFs. My repair approach uses a model of the visual relationships between elements in the baseline web 68 page to determine the correct layout of the faulty translated web page. Then my approach uses that same model to repair the translated web page and make its appearance match the baseline. I empirically evaluated the eectiveness and eciency of my repair approach and compared it against the state-of-the-art IPFs repair technique,IFix. I also conducted a user study to evaluate the repair quality from a user perspective to quantify the impact of the generated repairs on the attractiveness and the readability of the web pages. The results of the evaluation showed that my approach was faster thanIFix, reducing the time required for the repair by 96%. My approach required only 13 seconds, on average, compared to the average running time of 318 seconds that was required byIFix. Furthermore, in a side-by-side comparison of pages repaired by my approach andIFix, users rated my approach's repaired pages as more attractive and readable. The average readability rating given to my approach's repairs was 19% higher than the average rating of the repairs generated byIFix. Also, the average attractiveness rating given to my approach's repairs was 8% higher than the average rating of the repairs generated byIFix. These results indicate that my repair approach can help web developers to eciently produce more attractive and readable xes for internationalized web pages. Overall, my two approaches, which are based on modeling the visual relationships between elements, demonstrated a high eectiveness and eciency in detecting and repairing IPFs. These results conrmed the hypothesis of my dissertation and indicated that visual relationships models can be useful for automatically debugging IPFs in web applications. 6.1 Future Directions My two approaches are easily extensible. Developers who use my two approaches to detect and repair IPFs can easily extend them and apply a more strict or more relaxed set of visual relationships that need to be enforced in the translated web pages. In future work, we can explore which sets of visual relationships are most important to developers and have a higher impact on the appearance of a web page. In addition, other types of presentation failures can be addressed using models based on visual relationships. My dissertation focused on one type of presentation issue; the failures that arise from internationalizing web applications. There are many other types of presentation issues including cross-browser issues, mobile-friendliness issues, usability issues, and accessibility issues. Constraint-based approaches could be used to repair many types of these issues because various aspects of the layout of a web page can be modeled as a system of 69 constraints. For example, maintaining a certain level of spacing between elements in a web page can be modeled as constraints to improve the usability and the mobile friendliness of a web page. The techniques I proposed in my dissertation for detecting and repairing IPFs could be adapted to target this usability issue. Another possible direction of future work is to target internationalization issues in mobile applications. Addressing IPFs in mobile apps poses new challenges because the limited screen size can make the repairs generated by my approach unacceptable. For example, repairs that include increasing the width of elements might make these elements not t in a mobile screen. Such limitations require a repair strategy that takes into consideration the limited screen size. 70 References [1] Android RelativeLayout Documentation. https://developer.android.com/reference/ android/widget/RelativeLayout/. [2] Apple Internationalization and Localization Guide. https://developer.apple. com/library/content/documentation/MacOSX/Conceptual/BPInternational/ TestingYourInternationalApp/TestingYourInternationalApp.html. [3] Bootstrap Library. http://getbootstrap.com. [4] CBRepair Project. https://sites.google.com/site/cbrepairprojectsite/. [5] Compact Language Detector 2. https://github.com/CLD2Owners/cld2. [6] CSS Box Model. https://www.w3.org/TR/CSS2/box.html. [7] Cucumber. http://cukes.info/. [8] Firebug. https://addons.mozilla.org/en-US/firefox/addon/firebug/. [9] Google OR-Tools. https://developers.google.com/optimization/install/java/. [10] Google Website Translator. https://translate.google.com/manager/website/. [11] GWALI: Global Web Testing. https://sites.google.com/site/gwaliproject. [12] IANA Language Subtag Registry. http://www.iana.org/assignments/ language-subtag-registry. [13] IBM Guidelines to Design Global Solutions. http://www-01.ibm.com/software/ globalization/guidelines/a3.html. [14] Layout Testing Library Documentation. https://linkedin.github.io/LayoutTest-iOS/ index.html. [15] Selenium. http://docs.seleniumhq.org/. [16] W3C Internationalization Checker. https://validator.w3.org/i18n-checker/. [17] Zurb Foundation. https://foundation.zurb.com. [18] Abdulmajeed Alameer, Paul Chiou, and William G.J. Halfond. Eciently repairing interna- tionalization presentation failures by solving layout constraints. In Proceedings of the IEEE International Conference on Software Testing, Verication, and Validation (ICST), April 2019. [19] Abdulmajeed Alameer and William G.J. Halfond. An empirical study of internationalization failures in the web. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), October 2016. 71 [20] Abdulmajeed Alameer, Sonal Mahajan, and William G. J. Halfond. Detecting and Localizing Internationalization Layout Failures in Web Applications. In submission. [21] Abdulmajeed Alameer, Sonal Mahajan, and William G.J. Halfond. Detecting and localizing internationalization presentation failures in web applications. In Proceedings of the 9th IEEE International Conference on Software Testing, Verication, and Validation (ICST), April 2016. [22] J. Archana, S. R. Chermapandan, and S. Palanivel. Automation framework for localizabil- ity testing of internationalized software. In International Conference on Human Computer Interactions (ICHCI), Aug 2013. [23] Aiman M. Ayyal Awwad and Wolfgang Slany. Automated Bidirectional Languages Localiza- tion Testing for Android Apps with Rich GUI. Mobile Information Systems, 2016. [24] Alan Borning, Richard Lin, and Kim Marriott. Constraints for the web. In Proceedings of the Fifth ACM International Conference on Multimedia, MULTIMEDIA '97, pages 173{182, New York, NY, USA, 1997. ACM. [25] Shauvik Roy Choudhary, Mukul R. Prasad, and Alessandro Orso. X-PERT: Accurate Iden- tication of Cross-Browser Issues in Web Applications. In Proceedings of the 35th IEEE and ACM SIGSOFT International Conference on Software Engineering (ICSE), May 2013. [26] S.R. Choudhary, H. Versee, and A. Orso. WEBDIFF: Automated Identication of Cross- Browser Issues in Web Applications. In IEEE International Conference on Software Main- tenance (ICSM), Sept 2010. [27] Florian N. Egger. \Trust Me, I'm an Online Vendor": Towards a Model of Trust for e- Commerce System Design. In CHI Extended Abstracts on Human Factors in Computing Systems. ACM, 2000. [28] Andrea Everard and Dennis F. Galletta. How Presentation Flaws Aect Perceived Site Quality, Trust, and Intention to Purchase from an Online Store. Journal of Management Information Systems, 22:56{95, January 2006. [29] Thomas S Ferguson. Linear programming: A concise introduction. Website. Available at http://www.math.ucla.edu/ ~ tom/LP.pdf , 2000. [30] B. J. Fogg, Cathy Soohoo, David R. Danielson, Leslie Marable, Julianne Stanford, and Ellen R. Tauber. How do users evaluate the credibility of web sites?: A study with over 2,500 participants. In Proceedings of the 2003 Conference on Designing for User Experiences, DUX '03, pages 1{15, New York, NY, USA, 2003. ACM. [31] S. Halle, N. Bergeron, F. Guerin, and G. Le Breton. Testing Web Applications Through Layout Constraints. In Software Testing, Verication and Validation (ICST), 2015 IEEE 8th International Conference on, pages 1{8, April 2015. [32] Richard Ishida. Text Size in Translation. http://www.w3.org/International/articles/ article-text-size.en. [33] James A. Jones, James F. Bowring, and Mary Jean Harrold. Debugging in Parallel. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, ISSTA '07, pages 16{26, New York, NY, USA, 2007. ACM. [34] Gitte Lindgaard, Gary Fernandes, Cathy Dudek, and J. Brown. Attention Web Designers: You Have 50 Milliseconds to Make a Good First Impression! Behaviour & Information Technology, 25(2):115{126, 2006. 72 [35] Fan Long and Martin Rinard. Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE, 2015. [36] Sonal Mahajan, Negarsadat Abolhasani, Phil McMinn, and William G.J. Halfond. Auto- mated repair of mobile friendly problems in web pages. In Proceedings of the International Conference on Software Engineering (ICSE), May 2018. [37] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Eective repair of internationalization presentation failures in web applications using style similarity clustering and search-based techniques. In submission. [38] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Automated repair of layout cross browser issues using search-based techniques. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), July 2017. [39] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Xx: An automated tool for repair of layout cross browser issues. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA) - Tool Demo, July 2017. [40] Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, and William G.J. Halfond. Automated repair of internationalization failures using style similarity clustering and search-based tech- niques. In Proceedings of the International Conference on Software Testing, Validation and Verication (ICST), April 2018. [41] Sonal Mahajan and William G. J. Halfond. Finding html presentation failures using image comparison techniques. In Proceedings of the 29th IEEE/ACM International Conference on Automated Software Engineering (ASE) { New Ideas track, September 2014. [42] Sonal Mahajan and William G. J. Halfond. Detection and localization of html presentation failures using computer vision-based techniques. In Proceedings of the 8th IEEE International Conference on Software Testing, Verication and Validation (ICST), April 2015. [43] Sonal Mahajan and William G.J. Halfond. Websee: A tool for debugging html presenta- tion failures. In Proceedings of the 8th IEEE International Conference on Software Testing, Verication and Validation (ICST) - Tool Track, April 2015. [44] Sonal Mahajan, Bailan Li, Pooyan Behnamghader, and William G.J. Halfond. Using visual symptoms for debugging presentation failures in web applications. In Proceedings of the 9th IEEE International Conference on Software Testing, Verication, and Validation (ICST), April 2016. [45] Ali Mesbah and Arie van Deursen. Invariant-Based Automatic Testing of AJAX User Inter- faces. In Proceedings of the 31st International Conference on Software Engineering, ICSE '09, pages 210{220, Washington, DC, USA, 2009. IEEE Computer Society. [46] R.M.L.M. Moreira, A.C.R. Paiva, and A. Memon. A Pattern-Based Approach for GUI Model- ing and Testing. In Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, pages 288{297, Nov 2013. [47] BaoN. Nguyen, Bryan Robbins, Ishan Banerjee, and Atif Memon. GUITAR: An Innova- tive Tool for Automated Testing of GUI-Driven Software. Automated Software Engineering, 21(1):65{105, 2014. [48] Hung Viet Nguyen, Hoan Anh Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Auto- locating and Fix-propagating for HTML Validation Errors to PHP Server-side Code. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE, 2011. 73 [49] Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar Al-Kofahi, and Tien N. Nguyen. Recurring Bug Fixes in Object-oriented Programs. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE, 2010. [50] Pavel Panchekha, Adam Geller, Michael D. Ernst, Zachary Tatlock, and Shoaib Kamil. Ver- ifying that web pages have accessible layout. In PLDI 2018: Proceedings of the ACM SIG- PLAN 2016 Conference on Programming Language Design and Implementation, Philadel- phia, PA, USA, June 2018. [51] Pavel Panchekha and Emina Torlak. Automated Reasoning for Web Page Layout. In Pro- ceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA, 2016. [52] R. Ramler and R. Hoschek. How to test in sixteen languages? automation support for localization testing. In IEEE International Conference on Software Testing, Verication and Validation (ICST), March 2017. [53] Nick Roussopoulos, Stephen Kelley, and Fr ed eric Vincent. Nearest Neighbor Queries. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, SIGMOD '95, pages 71{79, New York, NY, USA, 1995. ACM. [54] Y. Ryou and S. Ryu. Automatic detection of visibility faults by layout changes in html5 web pages. In 2018 IEEE 11th International Conference on Software Testing, Verication and Validation (ICST), pages 182{192, April 2018. [55] Hesam Samimi, Max Sch afer, Shay Artzi, Todd Millstein, Frank Tip, and Laurie Hendren. Automated repair of HTML generation errors in PHP applications using string constraint solving. In Proceedings of the International Conference on Software Engineering, ICSE, 2012. [56] N. Semenenko, M. Dumas, and T. Saar. Browserbite: Accurate Cross-Browser Testing via Machine Learning over Image Features. In 29th IEEE International Conference on Software Maintenance (ICSM), pages 528{531, Sept 2013. [57] Elizabeth Sillence, Pam Briggs, Lesley Fishwick, and Peter Harris. Trust and mistrust of on- line health sites. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '04, pages 663{670, New York, NY, USA, 2004. ACM. [58] Nishant Sinha and Rezwana Karim. Responsive designs in a snap. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 544{554, New York, NY, USA, 2015. ACM. [59] Michael Tamm. Fighting Layout Bugs. https://code.google.com/p/ fighting-layout-bugs/. [60] Thomas Walsh, Gregory Kapfhammer, and Phil McMinn. Automated Layout Failure De- tection for Responsive Web Pages without an Explicit Oracle. In Proceedings of the 26th International Symposium on Software Testing and Analysis (ISSTA), July 2017. [61] Thomas A. Walsh, Phil McMinn, and Gregory M. Kapfhammer. Automatic Detection of Potential Layout Faults Following Changes to Responsive Web Pages. In International Conference on Automated Software Engineering (ASE), 2015. [62] X. Wang, L. Zhang, T. Xie, H. Mei, and J. Sun. Transtrl: An automatic need-to-translate string locator for software internationalization. In 2009 IEEE 31st International Conference on Software Engineering, pages 555{558, May 2009. 74 [63] Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, and Jiasu Sun. Locating Need-to-Translate Constant Strings in Web Applications. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, 2010. [64] Xiaoyin Wang, Lu Zhang, Tao Xie, Yingfei Xiong, and Hong Mei. Automating presentation changes in dynamic web applications via collaborative hybrid analysis. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE, 2012. [65] Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. Automati- cally nding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering, ICSE, 2009. [66] Sai Zhang, Hao L u, and Michael D. Ernst. Automatically Repairing Broken Work ows for Evolving GUI Applications. In Proceedings of the International Symposium on Software Testing and Analysis, ISSTA, 2013. 75
Abstract (if available)
Abstract
Web applications can be easily made available to an international audience by leveraging frameworks and tools for automatic translation and localization. However, these automated changes can introduce Internationalization Presentation Failures (IPFs)—an undesired distortion of the web page’s intended appearance that occurs as HTML elements expand, contract, or move in order to handle the translated text. It is challenging for developers to design websites that can inherently adapt to the expansion and contraction of text after it is translated to different languages. Existing web testing techniques do not support developers in debugging these types of problems and manually testing every page in every language can be a labor intensive and error prone task. ❧ In my dissertation work, I designed and evaluated two techniques to help developers in debugging web pages that have been distorted due to internationalization efforts. In the first part of my dissertation, I designed an automated approach for detecting IPFs and identifying the HTML elements responsible for the observed problem. In evaluation, my approach was able to detect IPFs in a set of 70 web applications with high precision and recall and was able to accurately identify the underlying elements in the web pages that led to the observed IPFs. In the second part of my dissertation, I designed an approach that can automatically repair web pages that have been distorted due to internationalization efforts. My approach models the correct layout of a web page as a system of constraints. The solution to the system represents the new and correct layout of the web page that resolves its IPFs. The evaluation of this approach showed that it could more quickly produce repaired web pages that were rated as more attractive and more readable than those produced by a prior state-of-the-art technique. Overall, these results are positive and indicate that both my detection and repair techniques can assist developers in debugging IPFs in web applications with high effectiveness and efficiency.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Detecting SQL antipatterns in mobile applications
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Automated repair of layout accessibility issues in mobile applications
PDF
Detecting anomalies in event-based systems through static analysis
PDF
Energy optimization of mobile applications
PDF
Techniques for methodically exploring software development alternatives
PDF
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Reducing user-perceived latency in mobile applications via prefetching and caching
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Architectural evolution and decay in software systems
PDF
Constraint-based program analysis for concurrent software
PDF
Proactive detection of higher-order software design conflicts
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Reducing inter-component communication vulnerabilities in event-based systems
PDF
Differential verification of deep neural networks
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Towards the efficient and flexible leveraging of distributed memories
Asset Metadata
Creator
Alameer, Abdulmajeed
(author)
Core Title
Detection, localization, and repair of internationalization presentation failures in web applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/19/2019
Defense Date
03/19/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,testing, internationalization,web applications
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Halfond, William G. J. (
committee chair
), Deshmukh, Jyotirmoy (
committee member
), Gupta, Sandeep (
committee member
), Medvidovic, Nenad (
committee member
), Wang, Chao (
committee member
)
Creator Email
alameer.v@gmail.com,alameer@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-184088
Unique identifier
UC11660089
Identifier
etd-AlameerAbd-7560.pdf (filename),usctheses-c89-184088 (legacy record id)
Legacy Identifier
etd-AlameerAbd-7560.pdf
Dmrecord
184088
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Alameer, Abdulmajeed
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
testing, internationalization
web applications