Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Utilizing user feedback to assist software developers to better use mobile ads in apps
(USC Thesis Other)
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
UTILIZING USER FEEDBACK TO ASSIST SOFTWARE DEVELOPERS TO
BETTER USE MOBILE ADS IN APPS
by
Jiaping Gui
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2019
Copyright 2019 Jiaping Gui
Dedication
To the glory and honor of God Almighty
ii
Acknowledgements
First of all, I would like to thank my advisor, William G.J. Halfond. Without his help and patience,
I would not have been able to nish my PhD study and research. I appreciate the opportunity
of being advised by G.J. The experiences working together are still fresh in my memory. In each
research project, G.J. inspired me to move forward on the right track by asking me questions. I
am also grateful to him for his kind support and advice in my job hunting. Whenever I needed
his support (e.g., reference letter), G.J. was always willing to help. During my PhD life, I have
learned a lot from G.J. In particular, his rigorous attitude and passion on research will benet
me in my future life.
I would like to thank my research collaborator Meiyappan Nagappan who participated in most
of my research projects, and provided kind help regarding reference letters. Mei was always ready
to help me with his expertise. Particularly, his input with professional knowledge greatly improved
my research. I enjoyed working with Mei, and really appreciate his kind help.
I would like to thank my committee members, Nenad Medvidovic, Chao Wang, Jyotirmoy
Deshmukh, and Paul Bogdan for their generous help and support. Their feedback greatly improved
my dissertation work.
I would like to thank my intern mentor Xusheng Xiao. In the summer of 2016, I was fortunate
to be an intern supervised by Xusheng. His support and encouragement led me to make progress
in the project. Without his professional skills and knowledge, I would not be able to submit a
paper and patent within three months. I am also thankful to his kind support to write reference
letters for me.
I was also fortunate to have nice lab mates, Ding Li, Sonal Mahajan, Bailan Li, Mian Wan,
Yingjun Lyu, Abdulmajeed Alameer, Negarsadat Abolhasani, and Seyed Hossein Alavi. Whenever
I needed some help, they were so kind to assist me. They gave me lots of happy moments in the
lab. In particular, I would like to thank Ding Li, who is both my lab mate and internship mentor,
for being always ready to share his experiences and ideas when I met problems in research.
I am very grateful to my family. My parents Quandong Gui and Meizhen Gui, and my sisters
Danping Gui and Danqing Gui always supported me in all my pursuits. Without their unending
iii
love, care and support, I could not imagine how I could go through my PhD life. I would also
like to oer my special thanks to my wife, Xingchen Wang, and my parents-in-law, Jiangang
Wang and Fang Fu. They provided me constant encouragement and support to help me overcome
challenges in my PhD life. I would also like to thank my aunt, my uncle, and other relatives who
are always with me whenever I need their help in my life.
iv
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables vii
List of Figures viii
Abstract x
Chapter 1: Introduction 1
1.1 Problems and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Major Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Insights and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2: Background 7
2.1 Mobile Advertising Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Mobile Advertising Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Mobile App Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 3: Overview of Dissertation 11
3.1 Ad Topic Identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Ad Topic Quantication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Ad Topic Impact analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 4: An Empirical Study of Mobile In-app Ad Reviews 15
4.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 RQ1: How frequent are ad-related reviews among all app reviews? . . . . . 18
4.2.2 RQ2: What are the common topics of ad reviews? . . . . . . . . . . . . . . 21
4.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 5: Quantication and Analysis of UI-related Ad Metrics 35
5.1 Motivation for the Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.1 Subject App Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.2 Workload Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.4 Analysis to Identify Ad Usage . . . . . . . . . . . . . . . . . . . . . . . . . . 41
v
5.2.5 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3.1 RQ1: What is the relationship between ad format and app ratings? . . . . 44
5.3.2 RQ2: What is the relationship between ad frequency and app ratings? . . . 45
5.3.3 RQ3: What is the relationship between banner ad size and app ratings? . . 46
5.3.4 RQ4: What is the relationship between banner ad position and app ratings? 47
5.3.5 RQ5: What is the relationship between ads in the landing page and app
ratings? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.6 RQ6: What is the relationship between repeated ad content and app ratings? 49
5.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter 6: Quantication and Analysis of Non-UI-related Ad Metrics 58
6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Case Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2.1 Selection of Subject Applications . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.2 Instrumentation of the Subject Applications . . . . . . . . . . . . . . . . . . 63
6.2.3 Generation of Subjects' Workloads . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.4 Monitoring and Analysis of Subject Applications . . . . . . . . . . . . . . . 64
6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.1 RQ 1: What is the performance cost of ads? . . . . . . . . . . . . . . . . . . 65
6.3.2 RQ 2: What is the energy cost of ads? . . . . . . . . . . . . . . . . . . . . . 67
6.3.3 RQ 3: What is the network cost of ads? . . . . . . . . . . . . . . . . . . . . 69
6.3.4 RQ 4: What is the rate of app updates related to ads? . . . . . . . . . . . . 71
6.3.5 RQ 5: What is the impact of ads on an app's ratings? . . . . . . . . . . . . 74
6.4 Generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Chapter 7: Related Work 82
7.1 Study of Ad Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2 Study of Ad Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Study of Ad Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.4 Study of User Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.5 Study of Ad Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.6 Study of Ad Fraud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.7 Study of Ad Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Chapter 8: Conclusion and Future Work 87
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
References 89
vi
List of Tables
4.1 Distribution of star ratings among all user reviews . . . . . . . . . . . . . . . . . . 20
4.2 Common topics and their example reviews among 400 ad reviews in the manual
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Comparison between complaint and positive-comment topics with regard to the
number of occurrence, the mean and median rating) . . . . . . . . . . . . . . . . . 31
5.1 Identied ad-related User Interface (UI) complaint topics and their ratios and ex-
ample reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Comparison between individual app category and whole dataset with regard to
the number of apps, the minimum, mean, median and maximum ratings, and the
CFD-percentage change [last three columns] for three rating points (25th, 50th and
75th-percentile ratings) when there was a 0.01 rating dierence . . . . . . . . . . . 52
6.1 Subject applications for the quantication and analysis of non-UI-related ad metrics 63
6.2 Comparison of non-UI-related ad costs among 21 apps with only one ad network,
two pairs of apps with two ad networks, and 773 apps that were used in the analysis
of UI-related ad aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
vii
List of Figures
2.1 Mobile advertising ecosystem with four stakeholders: advertisers, mobile ad net-
works, developers, and end users . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 An example of Facebook user reviews on Google Play app store . . . . . . . . . . . 10
3.1 Architecture overview of the dissertation work with three high-level research thrusts:
Ad Topic Identication, Ad Topic Quantication, and Ad Topic Impact Analysis . 12
4.1 Distribution of the number and ratio of ad reviews across app categories: the left
Y axis represents the number of ad reviews (in blue color), and the right Y axis
represents the ratio of ad reviews (in pink color) . . . . . . . . . . . . . . . . . . . 18
4.2 Heat map of the coding-result breakdown of 400 ad reviews in the manual study:
each row represents the frequency of ad reviews with the rating from one to ve
stars, and the color intensity in each cell represents the number of ad reviews (i.e.,
the deeper the color, the larger the number of ad reviews) . . . . . . . . . . . . . . 30
5.1 Cumulative frequency distribution (CFD) of star ratings among 10,750 apps: the
X axis represents the app rating from one to ve stars, the Y axis represents the
CFD distribution in terms of percentage, and two example points (in pink and light
blue color) are demonstrated on the distribution curve . . . . . . . . . . . . . . . . 43
5.2 Boxplot of star-rating distribution across 30 app categories: the X axis represents
the star-rating distribution from one to ve stars, and the Y axis represents the
app category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1 Relative performance cost in terms of memory usage (dark red bar) and CPU
utilization (light blue bar): each bar represents the percent dierence between the
performance metrics for the with-ads and no-ads versions. A higher number means
that the with-ads version had a higher value for the metric . . . . . . . . . . . . . 66
6.2 Relative energy cost: each bar represents the percent dierence between the energy
metric for the with-ads and no-ads versions. A higher number means that the
with-ads version consumed more energy . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Relative network cost in terms of data usage (light blue bar) and number of network
packets (dark red bar): each bar represents the percent dierence between the
network metrics for the with-ads and no-ads versions. A higher number means
that the with-ads version had a higher value for the metric . . . . . . . . . . . . . 70
6.4 Ad-related maintenance cost: each bar represents the ratio of the number of app
versions that had ad-related changes to the number of app versions that have been
released. A higher number means that the app had a higher ratio of app versions
with ad-related changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
viii
6.5 Percentage of complaints: each bar represents the percentage of one and two-star
reviews with user complaints about ads (dark red bar) or one of the hidden costs
(light blue bar). A higher number means a higher ratio of complaints for the metric 75
ix
Abstract
In the mobile app ecosystem, developers receive ad revenue by placing ads in their apps and
releasing them for free. While there is evidence that users do not like ads, we do not know
the aspects of ads that users dislike nor if they dislike certain aspects of ads more than others.
Therefore, in the rst part of this dissertation work, I analyzed the dierent ad-related topics in
app reviews from users. In order to do this, I investigated user reviews of apps that contained
complaints about ads (ad reviews). I examined a sample set of 400 ad reviews randomly selected
to identify ad-related topics in a systematic way. I found that most ad complaints were about
UI-related topics and the three topics discussed predominantly were: the frequency with which
ads were displayed, the timing of when ads were displayed, and the location of the displayed ads.
I also found users reviewed non-UI aspects of mobile advertising, such as ad blocking or slowing
down the host app's operation. My results provide actionable information to software developers
regarding the aspects of ads that are most likely to be complained about by users in their reviews.
The results of the above work indicate that although the apps are ostensibly free, they do,
in fact, come with costs. To analyze ad costs related to non-UI aspects, I designed a systematic
approach to study 21 real-world Android apps. The results showed that the use of ads led to
the consumption of signicantly more network data, increased energy consumption, and repeated
changes to ad-related code. I also found that complaints about these costs were signicant and
could aect the rating (on a scale of one to ve stars) given to an app. My results provide action-
able information and guidance to software developers in weighing the tradeos of incorporating
ads into their mobile apps.
In the third part of my dissertation work, I systematically investigated UI aspects of mobile
ads. My prior results showed the improper use of ads could become a source of complaints and
bad ratings for the app. Hence, developers must know how, where, and when to display ads
in their apps. Unfortunately, very little guidance is available for developers, and most advice
tends to be anecdotal, too general, or not supported by quantitative evidence. To address this,
I investigated UI-related ad topics, which my prior work had identied as the most common
type of ad complaint. To carry out this investigation, I developed analyses to quantify aspects
x
related to these UI topics and then analyzed whether there existed a relationship between these
values and the ratings assigned to the app by end users. I found lower ratings (with statistical
signicance) were generally associated with apps that had dierent visual patterns regarding ad
implementation, such as ads in the middle or at the bottom of the page, and ads on the initial
landing page of an app. Based on the results, I created a set of guidelines to help app developers
more eectively use ads in their apps.
xi
Chapter 1
Introduction
Mobile advertising has become an important part of many software developers' marketing and
advertising strategies [24]. This development has come about in just a matter of a few years.
Recent studies have shown that over half of all apps contain ads [91], and these ads collectively
generate an enormous amount of revenue for mobile app developers. As a case in point, a decade
ago, total mobile ad spending was just 320 million dollars. By 2014, the mobile advertising
industry's revenue had topped 49.5 billion dollars [88], and by 2018, analysts expect that mobile
ads will drive 75% of all digital ad spending and achieve a 20% increase from 2017 to over 70
billion dollars [28].
Mobile ads represent a seemingly straightforward way for developers to monetize their devel-
opment eorts. Developers place ads in their apps and then receive a small payment when these
ads are viewed or clicked. However, for mobile app developers, the usage of ads represents a
dicult balancing act. On the one hand, developers seek to maximize revenue by displaying ads
wherever possible. On the other hand, developers must be careful not to use ads in a way that
will signicantly degrade the user experience, as this can reduce an app's chances for success.
1.1 Problems and Motivation
The presence of mobile ads has become pervasive in the app ecosystem. This has been driven by
the development of large-scale advertising networks, such as Google Mobile Ads, that facilitate
interaction between developers and advertisers. In addition, mobile app stores provide users a
platform to leave reviews and rate apps. This feedback can in
uence the behavior of other users,
who may avoid negatively rated apps, and can also be a source of useful bug reports or suggestions
for improvement for developers. Prior studies have shown that these reviews cover a wide range
of topics, such as an app's functionality, quality, and performance [59] and that specic areas
1
of complaints, such as user dissatisfaction with ads, can negatively impact the ratings an app
receives [44]. It is in the interest of developers to avoid negative reviews and ratings as these will
make their app less appealing to new users or cause the app to be ranked lower by the app store.
(An app is ranked by the app store based on its popularity among end users.) In turn, fewer
downloads of the app are likely to lead to fewer ad clicks and views, which cuts into potential
advertising revenue that could be earned by developers. However, in the case of ads, the situation
is more complicated. Developers will not simply remove ads and, therefore, must nd a balance in
their use of advertising that avoids a negative experience, while enabling them to earn advertising
revenue. For example, placing ads in all pages of the app increases the chances that users interact
with these ads and thus increases the ad revenue developers receive, but it can also annoy users
due to the distraction of ads and, therefore, negatively impact ad revenue.
The eect of reviews and ratings on advertising revenue motivates developers to understand
what aspects of ads could cause negative or positive experiences. However, developers lack prac-
tical and even basic information about which ad-related aspects are more or less likely to produce
a negative experience for their users. In particular, developers lack a systematic ability to analyze
and understand ad-related reviews. Although there has been extensive study of app store reviews
(e.g., [20, 59, 74]), this work has not focused on ad-specic aspects.
To earn ad revenue, developers display ads in their apps by making calls to APIs provided by
an advertising network. Mobile Ad Networks (MANs) and app developer blogs have attempted to
provide guidance to developers on this topic [10, 19, 39, 78, 111]. However, this advice is generic
in nature, warning developers to avoid \too many" ads, for example, or not to overlap ads with
other elements. Although helpful to some extent, this advice does not provide developers with
guidance as to what could constitute too many ads or what sizes or positions of ads are likely to
be found more annoying than others. In particular, besides being unaware of what specic aspects
of mobile ads are complained about by end users, developers also lack information about whether
these aspects could actually in
uence an app's rating. Therefore, developers have to guess at user
preferences, which does not help prevent a drop in the app's ratings. In the hyper-competitive
app marketplace, even a small change in an app's rating can cause it to disappear from the rst
page of search results, which can signicantly reduce the app's chances of success [59].
1.2 Major Challenges
A casual perusal of app reviews shows that developers use ads in a way that often leads to negative
ratings and complaints. Developers who wish to avoid this situation face the lack of a systematic
2
approach to identify the most common or even the basic topics of end user reviews regarding
mobile in-app ads. For example, dierent people may have dierent interpretations of a single
review. It is challenging for developers to extract useful information related to ads in a scientic
manner. It is also unfeasible for them to examine ad-related topics at a large scale through manual
work.
The second challenge for developers is that the usage of ads represents a complicated and
delicate balancing act. On the one hand, developers seek to maximize revenue by displaying
ads wherever possible. On the other hand, developers must be careful not to exceed a threshold
beyond which ads start to have a negative impact on the user experience, which in turn leads to
lower app ratings and reduces the app's potential for success in the marketplace. Identifying this
balance is important but dicult for developers as there is a lack of guidance about how best to
utilize ads in apps. Therefore, developers are left to handle ads based on their assumptions about
how users will react or by responding to negative reviews posted about their app. Neither of these
are desirable approaches as the complainers may only be a vocal minority or the developers may
have incorrect assumptions about users' perceptions of ads.
Another challenge for developers is that an app may contain dozens of user interfaces or
activities, and there are many properties and best practices that must be checked on each user
interface. It is not feasible for them to manually check the ad implementation for each of the
pages in the app. Furthermore, the widespread diversity of the mobile app ecosystem, in terms
of OSs and hardware, means that these checks must be repeated on multiple platforms, which is
a labor-intensive process. This situation motivates the development of automated techniques to
quantify aspects of ads related to ad usage.
1.3 Insights and Hypothesis
The challenges in helping software developers make better use of ads in their apps motivate the
need for a systematic approach that helps app developers more eectively use ads. In this section,
I present the key insights underlying my research and the hypothesis that this dissertation tests
in order to realize the goal.
Insight One
My rst key insight is that user reviews can be leveraged for analysis to identify ad-related user
complaints on ad aspects or positive comments on ad improvement. End users provide ratings
and reviews in an app store to give feedback on their experience with an app. The textual reviews
3
of the app cover a wide range of topics, such as the app's functionality, quality and performance,
which help others decide which apps they are interested in and would like to try. Ads are an
integral part of their host app since they share the same screen with the host app when displayed
and can access all the resources within the scope of all permissions granted to the host app. As a
result, ad-related aspects inevitably could be included in the user reviews of the app. From this
perspective, developers are able to extract useful information about mobile ads to improve the
user experience of their app, since these reviews are feedback directly from end users.
Insight Two
My second key insight is that ad-related aspects in the app can be quantied by leveraging various
techniques, such as PUMA [50], for dynamic analysis. On a high level, there are two dierent
types of aspects related to mobile ads: UI and non-UI. To measure UI aspects (e.g., ad size), we
can gather two types of information: sceenshots and UI layouts. The former represents the app's
UI and the latter represents the structure of the current UI in terms of its components' locations
and sizes. Since developers display ads in their apps by making calls to APIs from ad libraries
and ads have unique size attributes, it is possible to accurately identify ads with this heuristic
and examine ad UI features. For non-UI aspects (e.g., energy consumption), the measurement
of related metrics requires that we have two versions of each app, one with ads and the other
without. To create the ad-free version of an app, we can use instrumentation-based techniques to
remove all invocations of APIs dened by the ad network. This is feasible, since these invocations
can be identied by matching the package name (e.g., \ads," \mobileads," and \mobads") of the
invocation's target method with that of known ad networks. The package names of ad networks
can be found by examining their API documentation.
Insight Three
My third key insight is that the relationship between ad usage and app ratings can be analyzed.
This can be done by applying statistical analysis to correlate ad-related aspects with app ratings,
once these aspects are quantied. In the related work [57, 91, 101, 27], app ratings have been
widely leveraged to understand dierent app aspects, such as an app's functionality, quality, and
popularity. Since mobile ads if embedded, are an integral part of an app, user ratings can be used
to understand the impact on apps of dierent ad aspects. Statistical analysis techniques are well
suited for analyzing such relationship. The results in terms of statistical signicance enable us to
understand the relationship in a scientic way.
4
Hypothesis
Based on the above three insights, the hypothesis statement of this dissertation is:
User feedback can be utilized to analyze the impact of dierent ad aspects on apps.
1.4 Summary of Results
This section provides an overview of the results included in this dissertation. The research process
is mainly described in three chapters, Chapters 4, 5 and 6, which discuss the extraction of ad
topics from ad reviews, the quantication of ad topics, and the analysis of the relationship between
ad topics and app ratings, in terms of both UI and non-UI aspects. Each of the chapters is based
on one or more papers, which have been published or are under submission.
Chapter 4: An Empirical Study of Mobile In-app Ad Reviews
This chapter focuses on the dierent topics of ad-related reviews from users. In order to analyze
these topics, I investigated app reviews that users gave for apps in the app store that were about
ads. A random sample set of 400 ad reviews were selected and their ad topics were extracted in a
systematic way. I found that most ad reviews were complaints that were about UI-related topics,
and three informative topics were brought up the most often: the frequency with which ads were
displayed, the full-screen size of the displayed ads, and the timing of when ads were displayed. My
results provide actionable information to software developers regarding the aspects of ads that
are most likely to be rated by users in their reviews. The work in this chapter is currently under
nal revision to be submitted to a software engineering journal [45].
Chapter 6: Quantication and Analysis of Non-UI-Related Ad Metrics
In this chapter, I studied 21 real-world Android apps to measure ad metrics corresponding to
non-UI-related ad topics that were extracted in Chapter 4. I found that the use of ads led to
mobile apps that consumed signicantly more network data, had increased energy consumption,
and required repeated changes to ad-related code. I also found that complaints about these non-UI
aspects could impact the ratings given to an app. My results provide actionable information and
guidance to software developers in weighing the tradeos of incorporating ads into their mobile
apps. This chapter includes two publications at the IEEE/ACM International Conference on
5
Software Engineering. The rst publication was a full paper in the main research track [44] and
the second publication was a workshop paper [43].
Chapter 5: Quantication and Analysis of UI-Related Ad Metrics
This chapter presents the results of a rst-ever investigation into the impact of UI-related ad
aspects that were the most common topics found in the process described in Chapter 4. I rst
developed program analyses to quantify aspects related to these topics and then statistically
analyzed whether there existed a relationship between certain kinds of ad usage and the ratings
assigned to the app by end users. From the ndings, I distilled a set of recommendations to help
app developers more eectively use ads in their apps. The work in this chapter is currently under
submission at a software engineering conference [42].
6
Chapter 2
Background
This chapter provides background information for the dissertation as a whole. Section 2.1 discusses
the fundamentals of the mobile advertising ecosystem and how the stakeholders in the system
interact with each other. The basics of mobile advertising are presented in Section 2.2. In
Section 2.3, I discuss the mobile app store where all raw data was collected for the analysis in the
dissertation.
2.1 Mobile Advertising Ecosystem
Mobile Ad
Networks
(MANs)
Adver3sers
Developers
End Users
Ask for Ads
Receive Ads
Publish
Interact
With-ads
Apps
Figure 2.1: Mobile advertising ecosystem with four stakeholders: advertisers, mobile ad networks,
developers, and end users
7
In the mobile advertising ecosystem (as shown in Figure 2.1), there are four main stakeholders:
end users, developers, advertisers, and MANs. To earn ad revenue, developers embed and display
ads in their apps. MANs, such as Google Mobile Ads, facilitate the interaction between developers
and advertisers. To do this, MANs maintain and distribute libraries that enable developers to
include ads served by the MANs in their apps. When an end user clicks on or views ads, the
developer receives a small payment from the MAN on behalf of the advertiser.
2.2 Mobile Advertising Basics
MANs provide sophisticated frameworks and infrastructure (e.g., ad libraries) to deliver ads and
allow developers to incorporate ads into their app. This process starts with developers signing in
to their MAN account (e.g., [36]) and registering an ad prole that provides them with a unique
ID that corresponds to their app. Using this ID, developers can congure the types of ads they
want their app to receive and characteristics about those ads. The exact congurations available
vary by ad network, but generally developers are able to choose between dierent size ads and
dierent types of content.
In the mobile ad ecosystem, in-app display advertisements are typically rendered as banners
or interstitial ads. Banners are small advertisements, usually at the top or bottom of the screen,
which allow a user to interact with the other visible elements of the app's User Interface (UI)
uninterrupted while the ad is being displayed. Interstitials are full-screen ads that block out an
app's other contents. They often freeze on the screen for a selected number of seconds until a close
button presents itself. Unlike banner ads, interstitials force users to interact by clicking through
the ad. For Google AdMob, a banner ad can be rendered as text or an image, while an interstitial
ad has three dierent types: text, image, and video.
To embed ads into mobile apps, developers need to call special APIs provided by an ad library.
Listing 2.1 is an example of how an app loads banner ads from the Google ad library. In this
program, developers use AdView to display ads. First, developers need to dene the size of ads
(SIZE) (line 8). In practice, SIZE has a nite number of values and these values are predened
in the AdSize class. In this example, I show a common value of SIZE, which is banner size.
However, developers can also select other values, such as FULL BANNER, LARGE BANNER
and SMART BANNER. After the size is specied, developers need to set the AD UNIT ID, a
unique ID generated by an ad network that associates the app requests to an ad network with the
corresponding ad unit prole (line 9). After binding the AD UNIT ID, developers can specify two
other congurations, TYPE and RRATE. TYPE denes what kind of ads, text or image, need
8
to be displayed. RRATE denes how often the app refreshes the contents of its ads. For AdMob
(a Google mobile advertising network), it is an integer ranging from 30 to 120, which means that
the app reloads the contents of ads every 30 to 120 seconds.
Listing 2.1: Example of adding ads to an app
1 private static nal String AD UNIT ID = "ca-app-pub-1234567890123456/1234567890";
2 private AdView adView;
3 public void onCreate(Bundle savedInstanceState)
4 f
5 super.onCreate(savedInstanceState);
6 setContentView(R.layout.activity main);
7 adView = new AdView(this);
8 adView.setAdSize(AdSize.BANNER);
9 adView.setAdUnitId(AD UNIT ID);
10 LinearLayout layout = (LinearLayout) ndViewById(R.id.linearLayout);
11 layout.addView(adView);
12 AdRequest adRequest = new AdRequest.Builder().build();
13 adView.loadAd(adRequest);
14 g
Mobile ads that are pure images can be displayed in any UI component (e.g., FrameLayout)
that Android allows to display images, but the most common form is ads displayed in the WebView
(note that WebView is a layout structure in Android, while AdView in Listing 2.1 is a view class
object that extends ViewGroup to display banner ads). This is because a WebView provides the
capability to display HTML-based web content as a part of an app's UI. It enables ad networks to
deliver a variety of ad content and track ad-related information (e.g., ad trac quality). Ad-based
WebView has size restrictions and other spatial attributes that are distinct from non-ad WebView.
Many factors aect the success of mobile advertising for advertisers and developers. These
include:
1. ad placement in relevant and popular apps,
2. the quantity and quality of clicks on ads (i.e., do clicks on the ad lead to purchases),
3. ways in which developers make use of the ads in their apps.
For the rst two factors, the MAN generally uses data mining techniques to help match apps and
ads, and carefully tracks the clicks and eventual actions of end users using tracking cookies and
other mechanisms, while the third factor is under the direct control of app developers.
To help developers to eectively use ads, MANs publicize guidelines and best practices for
ad usage. For example, MANs ban any kind of articial clicks performed on an advertisement
to in
ate click-through rates and hence income. Similarly, MANs also prohibit many placements
of ads that have similar eects, such as placing an ad too close to a UI element that has to be
9
clicked. Violating these guidelines can lead to a developer being cut o from usage of the MAN.
Although the consequence for violating some ad usage guidelines is not as severe as the others
that lead to being cut o, in the best case the violation represents a usage of ads that has been
identied as less than ideal. These usages may negatively aect an end user's in-app experience
and lead to lower ratings or negative reviews of an app. Alternatively, they may simply lead to
fewer impressions and click-throughs than would have otherwise been achieved with better ad
usage.
2.3 Mobile App Store
Figure 2.2: An example of Facebook user reviews on Google Play app store
An important additional, but somewhat indirect, player in the mobile ad ecosystem is the app
store. Users can leave reviews and rate apps in the app store. Figure 2.2 shows an example of
user reviews on the popular app Facebook in the Google Play app store. In the gure, the average
rating (in terms of 1 - 5 stars), the score distribution, and the number of total downloads are
provided. This feedback can in
uence the behavior of other users, who may avoid negatively
rated apps, and can also be a source of useful bug reports or suggestions for improvement for
developers.
10
Chapter 3
Overview of Dissertation
In this chapter, I will give an overview of my dissertation work. The goal of my dissertation is
to assist developers to better use ads in their apps based on users' feedback. To achieve this,
I rst identied ad-related topics by investigating app reviews that users gave for apps in the
app store that are about ad aspects. Using these topics, I then dened metrics to quantify
dierent ad aspects with respect to ad topics in terms of both UI and non-UI patterns and build
various analyses to determine these metrics. Finally, I carried out analyses on real-world apps
and performed extensive impact analyses to determine the relationship between ad aspects and
app ratings. The output is a report that provides a set of guidelines to help app developers more
eectively use ads in their apps.
To achieve the above goal, I divide my dissertation work into three high-level research thrusts:
Ad Topic Identication from app reviews (Section 3.1), Ad Topic Quantication of dierent ad
aspects (Section 3.2), and Ad Topic Impact Analysis, including statistical analysis (Section 3.3).
The architecture of the entire system with the three research thrusts and their components, as
well as the inputs and output of the system, is shown in Figure 3.1. I will describe every research
thrust below with its components in detail with reference to Figure 3.1.
3.1 Ad Topic Identication
The goal of ad-related topic identication is to identify the dierent aspects of ads that trigger
ad reviews from end users. To do this, I performed a systematic investigation as described below:
1. I began by identifying ad-related topics from a corpus of over 40 million app store reviews.
I found that a large number of user reviews were related to mobile advertising.
11
Generate
workload
Analyze ad topics
Non-UI
related
Ad Topic Identification Ad Topic Quantification
Input
Output
App (apk)
Manual
Ad topics
(ATs)
Dynamic
analysis?
yes
no
Collect
metrics
UI
related
App reviews
Analyze
impact
App ratings
Ad Topic Impact Analysis
Ad guidelines
for developers
Input
Quantify ATs
Analyze
structure
NLP
Figure 3.1: Architecture overview of the dissertation work with three high-level research thrusts:
Ad Topic Identication, Ad Topic Quantication, and Ad Topic Impact Analysis
2. I then analyzed the ratings and text of the reviews and found that those that mentioned
advertising were disproportionately likely to receive lower ratings than non-ad-related re-
views.
3. In a manual analysis, I leveraged qualitative content analysis and systematically analyzed a
statistically signicant sample of these reviews to identify the most common topics of end
user complaints on ad aspects or positive comments on ad improvement. I found most ad
reviews were complaints that were related to how ads interfered or interacted with UI-related
aspects of the mobile app. In particular, I found that UI issues relating to the frequency
with which ads were displayed, the full-screen size of pop-up ads, and the location of ads
were the most frequently mentioned complaints. For non-visual aspects, behaviors such as
the ad automatically downloading les or changing system settings, blocking or crashing the
host app's execution were the most frequently mentioned complaints.
The full description of this investigation is in Chapter 4.
3.2 Ad Topic Quantication
The goal of ad topic quantication is to quantify dierent ad aspects that correspond to each of
the ad topics identied in Section 3.1. To do this, I developed dynamic/static analyses to measure
these aspects based on the attributes associated with each ad-related topic. On the high level,
there are two dierent types of ad aspects: UI and non-UI-related.
12
To quantify non-UI ad aspects, I rst prepared two versions of each app, one with ads and the
other without. To create the no-ads version of an app, I used instrumentation based techniques
to remove all invocations of APIs dened by the ad network. Then I created workloads to execute
the app and exercise its functionality. The goals for each workload I created were: (1) complete
as possible with respect to the app's primary functions; (2) repeatable across multiple executions
of the app; and (3) long enough to ensure several ad reload cycles. To generate workloads, I
leveraged the RERAN tool [35]. This tool records a user's interaction with an app as a series of
events and can then replay these events at a later time. Then I collected runtime data on the
non-UI-related ad metrics (e.g., CPU and memory utilization) by running both versions of each
app (with ads and no-ads) with the same workload while monitoring its execution. Each non-UI
ad metric was determined by the dierence of corresponding non-UI metric between the with-
ads and no-ads versions. For example, if the complaints were about ad slowing down the app's
running, I collected the system resources (e.g., CPU and memory utilization) that were consumed
by mobile ads, and determine whether the ads aected the responsiveness of the system.
To quantify UI-related ad aspects, I began by selecting a subset of apps from the Google
Play app store. Then I ran a generated workload on each of the subject apps so that I could
collect information about how the app's ads were used and displayed at runtime. This is done
by using PUMA [50], a mobile app UI crawler framework built on top of the Monkey tool from
Google. While each app was running, I collected periodic screenshots and detailed information
about the layout of the app's UI. To do this, I modied PUMA so that a separate thread ran the
UIAutomator (UIAT) [40], which is part of the Android Testing Support Library. After doing
this for all apps, I performed an analysis of the collected information to identify the UI-related ad
metrics. For example, for the ad position in the app complained about by end users, I extracted
ad-related coordinates from the page layout, and detected whether the ad was in the top, middle
or bottom of the page.
3.3 Ad Topic Impact analysis
The goal of ad topic impact analysis is to determine the relationship between mobile ad usage
and app ratings, and distill a set of recommendations to help app developers more eectively use
ads in their apps.
For non-UI ad topics, I rst collected the reviews for each of the subject apps and analyzed the
reviews that had keywords related to ads (regex = ad/advert*) or any of the non-UI aspects with
13
respect to non-UI ad topics identied in Section 3.2 (regex = power/drain/recharg*/battery/batery-
/network/bandwidth/slow/hang). Then I calculated the percentage of one and two star reviews
where users complain about ads or one of the non-UI aspects. To determine the impact of these
reviews on the ratings, I further recalculated each app's new rating if the reviews complaining
about either ads or one of the non-UI aspects were to be removed.
For UI-related ad topics, I leveraged dierent ad aspects of ad visual patterns quantied in
Section 3.2 and performed an extensive statistical analysis to determine whether these aspects
have an impact on the user ratings of real-world marketplace apps. In particular, I divided the
apps into two groups,S
1
andS
2
, where one group represents apps with the aspect that I wanted to
evaluate (e.g., more than two ads in the same page) and the other group does not (e.g., maximum
one ad in a page). For example, to determine whether ad format correlates with app ratings,
I compared apps that had banner ads (S
1
) against apps that had interstitial ads (S
2
). I then
calculated the statistical signicance of the result by comparing the ratings distribution in each
group. The statistically signicant results indicate the corresponding ad aspect has an impact on
user ratings.
14
Chapter 4
An Empirical Study of Mobile In-app Ad Reviews
In just a matter of a few years, the global market has experienced a tremendous increase in the
number of apps that consumers use on their smartphones. As of March 2017, both the Google
Play and Apple app store boasted over two million apps [98]. Along with this growth in apps,
mobile advertising in apps has become an important source of revenue for software developers
[24]. In 2010, the mobile advertising industry's revenue was just over half a billion dollars [106].
By 2018, analysts predict that revenue from mobile advertising will reach 160 billion dollars and
account for 63% of all global digital advertising spending [93].
In the mobile advertising ecosystem, there are four main stakeholders: end users, developers,
advertisers, and Mobile Ad Networks (MANs). To earn ad revenue, developers embed and display
ads in their apps. MANs, such as Google Mobile Ads, facilitate the interaction between developers
and advertisers. To do this, MANs maintain and distribute libraries that enable developers to
include ads served by the MAN in their apps. When an end user clicks on or views ads, the
developer receives a small payment from the MAN on behalf of the advertiser.
An important additional, but somewhat indirect, player in the mobile ad ecosystem is the app
store. Users can leave reviews and rate apps in the app store. This feedback can in
uence the
behavior of other users, who may avoid negatively rated apps, and can also be a source of useful
bug reports or suggestions for improvement for developers. Prior studies have shown that these
reviews cover a wide range of topics, such as the app's functionality, quality, and performance [59]
and that specic areas of complaints, such as user dissatisfaction with ads, can negatively impact
the ratings an app receives [44]. It is in the interest of developers to avoid negative reviews and
ratings as these will make their app less appealing to new users or cause the app to be ranked
lower by the app store. In turn, fewer downloads of the app are likely to lead to fewer ad clicks and
views, which cuts into potential advertising revenue that could be earned by developers. However,
in the case of ads, the situation is more complicated. Developers will not simply remove ads, but
15
must nd a balance in their use of advertising that avoids a negative experience, but still enables
them to earn advertising revenue.
The eect of reviews and ratings on developers' advertising revenue motivates them to un-
derstand what aspects of ads could cause negative or positive experiences. However, developers
lack practical and even basic information about which ad-related aspects are more or less likely to
produce a negative experience for their users. Although many developer blogs (e.g., [19, 78, 111])
attempt to provide such guidance, and even ad networks often suggest \best practices" (e.g.,
[39, 10]), this information is often anecdotal, lacks rigorous evidence to support the advice, or
is too generic to provide developers with concrete guidance. Furthermore, developers lack a sys-
tematic ability to analyze and understand ad-related reviews. Although there has been extensive
study of app store reviews (e.g., [20, 59, 74]), this work has not focused on ad specic aspects.
To address this issue, I conducted an empirical analysis of ad-related reviews. In this chapter,
I present the results of this investigation, which enabled me to identify many dierent aspects of
ads that frequently trigger ad complaints and positive comments on ad improvement. To carry
out this investigation, I performed a systematic approach as below:
1. I began by identifying ad-related reviews from a corpus of over 40 million app store reviews.
I found that there were, in fact, a large number of user reviews that discussed mobile
advertising.
2. I then analyzed the ratings and text of the reviews and found that those that mentioned
advertising were disproportionately more likely to receive lower ratings than non-ad-related
reviews.
3. I then systematically analyzed a statistically signicant sample of these reviews to identify
the most common topics of end user complaints on ad aspects or positive comments on ad
improvement. To do it, two external students, one from Waterloo University, the other from
USC, were recruited to label ad topics for each of the ad reviews. Based on the results,
I found that most ad reviews were complaints that were related to how ads interfered or
interacted with User Interface (UI)-related aspects of the mobile app. In particular, I found
that UI issues relating to how frequently the ad was shown, whether the ad is popped up,
where the ad was placed were the most frequently mentioned complaints. For non visual
aspects, the behaviors, such as the ad blocking or crashing the host app's execution, auto-
matically downloading les or changing system settings were the most frequently mentioned
complaints.
16
Overall, these results showed clear trends in users' ad-related reviews that can help developers
to better understand the aspects they should be most concerned about when placing ads into their
apps. I believe better understanding of these aspects can help developers improve the overall app
user experience and thus allow them to continue to take advantage of the potential mobile ad
revenue.
The rest of chapter is organized as follows. In Section 4.1, I introduce and motivate each of
the research questions I address in this chapter. In Section 4.2, I describe the details and results
of the analysis I carried out for each of the research questions. The threats to the validity of my
results are discussed in Section 4.3. Finally, I summarize my ndings in Section 4.4.
4.1 Research Questions
My investigation broadly focuses on end users' app store reviews that relate to mobile advertising.
Below I more formally introduce and motivate my research questions (RQs).
RQ1: How frequent are ad-related reviews among all app reviews?
An app store allows users to write textual reviews of the apps they have downloaded. If users
have had a positive or negative experience with mobile advertising, they may comment on this
in their reviews that consist of two parts: the textual content, and a numeric rating for the app.
The rating is typically provided by a user on a scale of one to ve, with ve being the highest.
Developers care about these ratings because they in
uence how app stores display apps in response
to a user search. Higher rated apps tend to be given more priority when displayed to the user
[113].
Therefore, my rst, and most basic, research question examines the frequency and ratings
distribution of app reviews that include comments on mobile advertising. The results of this RQ
can serve to inform developers as to how prevalent such reviews are in the corpus of all reviews,
and the type of in
uence ad-related reviews are having on an app's overall rating.
RQ2: What are the common topics of ad reviews?
Complaints or positive comments in ad-related reviews may be due to numerous reasons. For
example, users may be upset when an ad is interfering with the display of important information.
However, it is unknown what concrete aspects contribute to the interference. For developers it is
important to understand what aspects of ad usage or behavior are causing user complaints so that
they can focus their eorts on these aspects, and what ad aspects users have positive comments so
that they can maintain these merits for sustained ad revenue. Therefore, in this research question
I am interested in determining the ad-related topics, that is the specic aspect or issue related
17
0
10000
20000
30000
40000
50000
60000
0
0.5
1
1.5
2
2.5
3
3.5
4
Ad review ratio (%)
# ad reviews
# ad reviews
Ratio (%)
Figure 4.1: Distribution of the number and ratio of ad reviews across app categories: the left Y
axis represents the number of ad reviews (in blue color), and the right Y axis represents the ratio
of ad reviews (in pink color)
to ads that users are complaining about or praising in a review. Based on the topics, I further
analyze whether ad complaints or positive comments have a relationship with app ratings. The
results of this RQ would inform developers as to the most problematic or appealing aspects of ad
usage and guide them in determining how to improve their apps' ad usage.
4.2 Results and Discussion
In this section, I discuss the details of the approach I employed to address each of the RQs, present
the results I obtained, and discuss the implications of these results with respect to each of the
RQs.
I rst describe the protocol for obtaining the data required to answer all RQs. The data
preparation includes the collection of both app reviews and other app related information. To
collect app reviews, I crawled the Google Play app store everyday using a farm of systems for over
two years, to download every new release (i.e., APK) of the app and its associated meta-data,
such as average user rating, user reviews (500 every day), and their corresponding app ratings,
among other things. For this collection, I downloaded the top 400 ranked apps as ranked by
Distimo [5] (now called App Annie [9]) in 30 ocially recognized app categories. In the end, I
collected a corpus of 10,750 apps and over 40 million app reviews. These reviews, together with
their user ratings and categories were used to address each research question.
18
4.2.1 RQ1: How frequent are ad-related reviews among all app reviews?
Approach: To answer this research question, I applied the regular expression (i.e., regex =
ad/ads/advert*) to identify ad-related reviews from all of the collected app reviews. I chose these
particular keyword variations based on guidelines provided in related work that also examined
user reviews for dierent types of user complaints [59]. I then counted the frequency of ad-
related reviews and their percentage with respect to all of the reviews. I also calculated similar
information for each of the 30 standard app categories.
Results: I obtained in total 529,827 ad-related reviews out of over 40 million app reviews. This
indicates that approximately 1.3% of all user reviews, dealt with in-app ads. Note that since I only
considered reviews that explicitly mentioned one of the keywords, this ratio should be considered
a lower bound, and it is possible I did not consider all possible ways to comment on ad-related
reviews. The ad review distribution among dierent app categories is shown in Figure 4.1, where
besides frequency, I also report the ratio of the number of ad reviews to the number of all app
reviews. The blue bar shows the number of ad reviews (w.r.t. left Y axis) and the pink line shows
the ratio (w.r.t. right Y axis). The X axis lists all 30 app categories. As the results show, the
frequency of ad reviews varies among dierent app categories. Nine categories have an ad review
ratio over 1.5%, and eight have less than 0.8%. As for the absolute numbers, seven have over
30,000 ad-related reviews, and one has over 50,000. The median number and ratio of ad reviews
per category are 11,669 and 0.96%, respectively.
Discussion:
From the results, we can see that although the ratio of ad reviews looks small, the amount is
non-negligible. In fact, there are a large amount of such reviews that are feedback directly from
end users. In addition, mobile advertising has become one of the main resources of revenue that
developers receive when they publish free apps in the app store. From this perspective, ad-related
reviews are worthy of developers' attention. Related work [44] has shown these reviews have a
measurable impact on app ratings.
I also found that both the frequency and ratio of ad reviews varied by category. In particular,
two app categories, BRAIN and ENTERTAINMENT, were high in both the number and ratio of
ad reviews. Another two categories that are worthy to note were COMICS and MEDICAL. They
had a high ad review percentage in spite of the relatively small number of ad reviews. This re
ects
that end users commented on apps in COMICS and MEDICAL not as actively as in other app
categories, but experiences with ads tended to be one of the big problems that would cause users
to complain. The potential reason for such a higher number or ratio in any of the four categories
19
Table 4.1: Distribution of star ratings among all user reviews
Review rating ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
% of ad-related reviews 33.29 13.21 14.51 17.1 21.89
% of non-ad-related reviews 12.13 4.51 7.27 15.11 60.98
mentioned above could be that developers are more aggressively embedding ads in apps of these
categories or that users of such apps have a lower tolerance for ads.
Besides the frequency of ad reviews among the corpus of app reviews, I am also interested in
the frequency distribution of both ad and non-ad review ratings (i.e., \stars"). Table 4.1 shows
the distribution of all reviews across the dierent rating stars. We can see that, for all ad-related
reviews, almost half (about 46%) of the reviews have one or two star ratings. In contrast, most
of the non-ad reviews have ve star ratings with only about 17% being one or two stars.
These results show that reviews mentioning ad-related topics disproportionately have a lower
rating than non-ad-related reviews. Thus these reviews, with their corresponding ad-related
topics, can convey valuable information about what kind of ad aspects developers should address
to improve their app ratings, and what kind of ad aspects developers should keep to maintain
their ad revenue. The high ratio of low ratings among ad-related reviews also suggests that ad-
related complaints can have a signicant negative impact on app ratings. In particular, for the
corpus of ad-related reviews, the average rating is 2.81 stars, while this rating is 4.08 stars for
non-ad-related reviews. I recalculated each app's new rating if all ad-related reviews were to be
removed. The average increase in rating among 10,750 apps would be 0.015 stars, which though
small, could have a great impact on an app's success (e.g., ranking in the app store) [44].
I also found that the rating distribution of ad reviews varied by category. In particular,
app categories such as FINANCE, HEALTH AND FITNESS, and TOOLS had the lowest ra-
tio of reviews with one or two star ratings (i.e., highest ratio for four or ve stars), while
NEWS AND MAGAZINES and SPORTS had the highest ratio (over 60%) of reviews with one or
two stars. This re
ects that end users are more open to accept ads in apps that provide working
or personalized information (such as nance) than those that provide general information (such as
news). The potential reason could be that the news related apps are used by end users for a longer
time, or mobile ads in these apps are more disruptive than apps in other app categories. The
actual reason that contributes to the above observation is outside the scope of this dissertation.
In fact, I would like to investigate the underlying relationship between app category and ratings
distribution in future work.
Overall, the above results motivate me to inspect ad reviews in the following section (i.e.,
Section 4.2.2) to determine what ad aspects do users care about and thus matter to developers.
20
In particular, knowing what users do not like about ads could give developers a chance to address
them and possibly improve rating.
4.2.2 RQ2: What are the common topics of ad reviews?
Approach:
To answer this research question, a manual analysis of a subset of the ad-related reviews was
conducted to determine the most common ad-related topics. I chose a manual analysis over an
automated analysis, such as K-means [52] and Latent Dirichlet Allocation (LDA) [17], because
in prior experience, the automated analyses were less accurate at identifying new and distinct
topics than a manual analysis, which was also widely used in related work such as [59, 31]. For
the manual analysis, I leveraged a research technique called qualitative content analysis [29],
which is a research method for the subjective interpretation of the content of text data. This
technique allows me to identify the ad-related topics in a subjective but scientic manner through
the systematic classication process.
The core operation of qualitative content analysis is coding, which is the process of organizing
and sorting qualitative data [29]. The result of this process is a set of codes that serve as a way
to label and organize data by topic. In my approach, the topic assigned to each ad review was
a code, and I adopted a hybrid process of creating codes: the coder started coding based on a
preset of codes (also referred to as \a priori codes") that were dened oine, and created codes
(i.e., emergent codes) for reviews and added them to the preset when new topics were found. To
code ad reviews, the following questions were asked in a systematic way for each review:
1. Is this ad review related to ad aspects or non-ad aspects?
2. Is this ad review a positive comment, neutral comment or complaint?
3. Is the information being conveyed by this ad review descriptive or non-descriptive?
4. If descriptive, what are the topic(s) of this ad review?
In this study I utilized 400 randomly selected ad reviews for analysis. To identify these
reviews, I randomly sampled the collected corpus of 529,827 ad-related reviews that were ltered
through the regular expression as described in Section 4.2.1 to collect 400 reviews. The size
(i.e., 400) of these reviews gave us a 95% condence level with a 5% condence interval, which
ensured a high degree of condence that my categorization results would be indicative of the
larger population. For example, if 80% of 400 sample reviews were complaints about ads, then we
would be 95% certain that between 75% (80 - 5) and 85% (80 + 5) of 529,827 ad-related reviews
21
would be user complaints about ads. The unit of analysis was a user review that contained ad-
related information. For the coding purpose, only the text content of ad reviews was provided
for analysis. I hid meta-data information, such as the corresponding app name or user rating, to
ensure that coders would not be aected by such information during the coding process.
I then developed a coding manual that consists of topic names, denitions or rules for assigning
codes to ad reviews, and examples, following well-known best practices [75]. To make the coding
manual as complete as possible, I tested it on a sample set of 100 ad reviews with a preset of codes
and checked coding consistency, which allowed me to revise the coding rules in the manual. For
example, I coded the review text \ads" as non-descriptive but \ads ads ads" as a complaint on ad
frequency. I continued this iterative process until sucient coding consistency could be achieved.
After the coding manual was completed, I recruited two external students as coders, one from
Waterloo University, the other from USC, to code these 400 ad reviews. Two coders coded 400
ad reviews independently by referring to the coding manual. The reason I did this was to ensure
reliability of the coding results that could otherwise be aected by individual subjectivity if the
analysis were done by myself. Note that two coders are enough for the analysis, and we would not
have achieved higher reliability by adding more coders [46], since no other contexts were provided
to coders during the coding process except ad review content. Such an approach has also been
widely used in the literature, such as [7, 82, 102]. Each coding process was conducted on Windows
using NVivo Pro 11 [55], a computer software package widely used for qualitative content analysis.
I then analyzed the consistency of results collected from both coders. To do this, I calculated
the inter-rater agreement [92] to determine the degree of consensus between these two coders. The
rst agreement I calculated was the review-topic level agreement. This represents the percentage
of common topic results (intersection set) between two coders out of all coding results (union
set). The ratio I obtained was 0.86, which meant that for 86% of results the reviews had the
same topics assigned. The second agreement I calculated was the agreement representing the
percentage of the review's content where the two coders assigned the same or dierent topic or
neither coder assigned a topic. The mean percentage agreement I obtained was 86%, and the
median was 97%. This indicates that for those reviews with the same topic, there is a high
overlapping area regarding the topic. The third agreement I calculated was the Cohens Kappa
coecient [46], which takes into account the amount of agreement that could be expected to occur
through chance. The mean/median coecient I obtained was 0.57, which represents a moderate
agreement according to the Kappa guidelines [54]. Note that this agreement is not as good as the
previous two. The reason is that sometimes one coder focused on assigning topics to the whole
22
review instead of part of the review. Overall, three agreement metrics above show the coding
results are reliable for analyses.
Results and Discussion:
1) High-level Topic Distribution
My rst result is a high-level classication of all 400 ad-related reviews. I found that most of
the ad reviews with various ratings were indeed about ad aspects. Of the 400 reviews, only 1.5%
(others) were about non-ad-related aspects. And for those rest reviews that were related to ad
aspects, 69% were descriptive and 29.5% non-descriptive.
From another perspective, I found that most of the ad reviews were user complaints. This
aligns with the nding in Section 4.2.1 that ad-related ratings are with lower stars compared to
non-ad-related. Of the 400 reviews, 91.8% were complaints about ad aspects, 4% were neutral, and
4.2% were positive comments on ad improvement. Among those reviews with positive comments,
I could not nd any review that commented highly on the ad itself. Instead, users had positive
comments on ad improvement (e.g., less ads or ad position xed) after comparing to an earlier
version of the app.
Of the descriptive ad reviews, I found that most were topics that could be considered UI-related
that focused on visually observable aspects (e.g., size and location) of the ads. Altogether, reviews
in these topics represented about 71.5% of all of the reviews. The non-UI-related ad topics dealt
with ad functional properties (e.g., a slow down in the app's execution or unexpected audio). They
represented about 25% of all ad reviews. Note that some reviews could have both UI and non-
UI-related topics. For example, the review `Ads, Ads everywhere. Popups also. Expected battery
drain' conveys information about multiple ad aspects (both UI and non-UI) such as frequency,
format, and energy consumption.
In total, there were 549 occurrences of dierent ad topics. I summarize the topics of 400 ad
reviews in Table 4.2, where I list the topic, its high-level type information, the number of reviews
that contained the topic, and example reviews. The topics with sux \-c" represent complaint,
\-n" neutral, and \-p" positive comment. I now discuss the ndings related to UI and non-UI
aspects with respect to both complaint and positive comment.
2) UI-Related Ad Topics
Frequency: This topic (frequency-c/p in Table 4.2) deals with how often ads appear
in an app. Current mobile ad networks pay developers based on the number of ad clicks or
ad impressions they achieve in their app. Therefore, developers are incentivized to encourage
23
Table 4.2: Common topics and their example reviews among 400 ad reviews in the manual study
Type Topic
Percentage
(%) of 400
Example reviews
UI-Related Ad Complaint
(total #: 279)
frequency-c 30.75
`Ads every where'
`Ads ads and more ads'
popup-c 12.75 `...Ads kept popping up I cant use the freaking app!'
time-c 7
`...This game has way too many ads, many of which come up when loading the game
to start o with and close the game out when the ad video ends'
`Ads ruin it! Im in the middle of playing and ads pop up and make my time run longer'
location-c 6.75
`Bad. Ad Placement. Ads get in the way...'
`Ads are placed poorly. I dont want them in the conversation list...'
content-c 5.75
`Rubbish. Ads and sounds made from someone blowing raspberries'
`Ad tricks! To much ad tricks. Making you click on the stupid ad to install some
bullshit...'
size-c 4.25
`...Ad takes up too much space on phone screen'
`Huge ad. Ad takes up 1/4 the screen! Thats ridiculous. Uninstalling'
video-c 3.25 `Too many Ads!! Ads after every play and unexpected very loud videos'
notification-c 2
`Ad pop ups are annoying. Forced to get notications'
`Warning. Ad based service, that constantly sends you notication with
recommendations'
Non-UI-Related Ad Complaint
(total #: 100)
block-c 9.25 `Ads don't load. Having recurring problem loading ads and pictures'
paid-c 5 `Paid for upgrade... Ads? Ads in the new version even though I paid...'
crash-c 4 `Ads Crash App. Video ads cause the app to crash'
automated-c 3.5 `Ads play at full volume even when your system volume is o...'
privacy-c 1.5 `Ad supported means data mining. Not a chance'
slow-c 1.25 `Ads slow it down. Used to work ne now
ashy ads make the game lag...'
battery-c 0.5 `Ads. Ads everywhere. Popups also. Expected battery drain'
UI-Related Ad-Positive Comment
(total #: 7)
frequency-p 1.25
`Does SO much for FREE! Ads are minimal...'
`My fave. Ads are not overwhelming...'
location-p 0.25 `Finally xed FB issues. Ads on playing board screen removed. Dont bring them back'
popup-p 0.25 `Amazing. Ads Free. No Popup'
Non-UI-Related Ad-Positive Comment
(total #: 1)
paid-p 0.25 `...Ad-free is the way to go! $.99 hell yea!'
Non-descriptive
(total #: 162)
complaint 32.25
`Good. Ads are irritating'
`Disgusting. Ads and ads'
`Ads are annoying'
neutral 5.25 `Ads are present'
positive comment 3 `Ads work great...'
24
user clicks and ad views. A developer may believe that one way to achieve this is by display-
ing more ads in their app. However, users may be annoyed by too many ads, since it could
be distracting or unsightly. Among the UI-related ad complaint topics that I analyzed, fre-
quency is the one that was complained about most by end users. In particular, users commented
on this topic using keywords such as \many/much/overwhelming/galore/blaster/overkill/every-
where/bombarded/spam/frequent/overload..." I further investigated each of the reviews under this
complaint topic. I found in the high level that there were three dierent aspects associated with
the topic.
The rst aspect is activity-based. In Android apps, each activity in the source code is executed
and rendered as a page or screen. There are two dierent types of complaints among user reviews:
1) The ad ratio that represents the number of an app's pages that contain ads versus the total
number of pages in the app. Users complained that many pages contained ads. For example, one
user complained \ads every 3 or 4 puzzles." 2) The number of ads per page that represents the
number of ads displayed simultaneously in the same page. For example, one user complained,
\watching at least 2 ads for every article you want to read."
The second aspect is time-based. This metric represents ads being displayed once per a
particular time interval (e.g., ve seconds) during users' interaction with an app. For example,
one user wrote, \Ads pop up every 2 seconds."
The third aspect is event-based. Android apps support the user experience with an event-
driven and GUI-based nature. Users complained about ad frequency in their reviews by mention-
ing ads displayed \every tap/every button push/every touch/every move." For example, one user
complained, \ads on EVERY button push. Horrible app."
Although ad frequency was most complained about, I also found it was the one aspect that
was most positively commented by end users (see frequency-p in Table 4.2). Users rated an
app positively when the app had fewer ads or was ad-free. In Section 4.2.2.4, I discussed these
positively commented aspects and criticized aspects in more detail.
Format: This topic (popup-c/p and video-c in Table 4.2) corresponds to dierent types of
ads. In general, there are two types of ads that can be included in an app by a developer: banner
and interstitial. Ads that occupy the full screen are called interstitial ads. Others, which appear
as narrow horizontal strips, are typically referred to as banner ads. Interstitial ads are displayed
in the form of a pop-up image or video and generally have a higher payout than banner ads [14].
However, interstitial ads interrupt a user's interaction with an app by requiring the user to view
the ad for a time interval or click a close button to return to the app. This behavior could make
users react negatively to interstitial ads.
25
For the reviews that contained popup-c, most (27 reviews) were complaints about the frequent
interstitial ads, such as \Ads keep popping up." Many were complaints about the time when the
ads were popped up, such as \in game" or \after closing the app." Some were complaints about
the position of ads that covered non-ad elements and prevented users from interacting with the
app. A few were complaints that popups were in paid version of apps, made users restart the app,
or had the full-screen size. For the reviews that contained video-c, users mainly complained that
video ads had high data usage, played expectedly with full volume, were repeating constantly or
made the app crash. Besides above complaints, I also found one user rated an app highly due to
no popups of ads.
Timing: This topic (time-c in Table 4.2) is related to when an ad is shown or how long the
ad is displayed. For example, a sure-re way to make a user see an ad is to place the ad on a
landing page that is shown to the user when the app starts or before the next level in a game app.
However, complaints from users indicate that this may aect users' impressions of the app.
I found over half the complaints on this topic were about the improper timing of when an
ad was displayed. This time can be right after app starts (e.g., \ad sites load upon initially
launching"), in app play (\in the middle of playing and ads pop up"), or after play (\jumping to
an ad as soon as a round is over"). The other type of complaints were the interval or duration
of ads being displayed. The following are some examples: \30 second ads are a bit much," \ads
last a long time," \ads interrupt every 90 seconds."
Location: This topic (location-c/p in Table 4.2) is about where an ad is within the app's
UI. Developers may place ads anywhere in the UIs of their app. Odd positions may increase
the attention brought to the ad, which could lead to more clicks. However, ad position directly
impacts the layout of other elements in the page. Ads in certain positions, such as the middle of
the page may be disruptive to the usability of the app, since other elements can only occupy the
upper half and lower half of the page. Overall, there were three dierent aspects of ad position
that were complained about.
The rst aspect is the placement of ads in certain screens that are supposed to be ad-free. The
following are highlights from example reviews: \ads between the news stories," \ads on the actual
gameplay screen," \ads showing on the actual conversation screen," and \ads in the conversation
list."
The second aspect is the relative position (e.g., top, middle, or bottom) of ads on the screen. All
three positions were complained about by end users among the 400 reviews, including comments
such as \ads at top of screen are opening," \ads on the middle hate," and \ad at bottom is why
I uninstalled."
26
The third aspect is the spatial relationship of ads with other non-ad objects in the screen.
This relationship includes ads adjacent to app elements, ads sandwiched between app items, and
ads overlapping app contents. For the spatial relationship, most reviews were complaints about
ads blocking other contents (e.g., sandwiched or overlapping), such as \ads doesn't allow to see
the bottom line," \ads cover part of the screen blocking half the game," and \ads block vision of
pictures." There was one review complaining about adjacent relationship: \ads are to close to
emails in inbox." I also found one user positively commented on an app when ads were removed
from the playing board screen.
Content: This topic (content-c in Table 4.2) deals with what is in the ads shown to users.
Appealing ad content can help catch users' attention. However, I found evidence in the reviews
to suggest a problematic aspect to this. Namely, users would complain when the content of an
ad was inducing or irrelevant to the app's functionality. In particular, there were three dierent
aspects of complaints related to ad content:
1. Inappropriate content, which re
ects the improper message conveyed by ads, such as oers,
warning of a virus, warning of a lack of space, or hot singles.
2. Repeated content, which re
ects the same ads displayed during the whole interaction, such
as a KFC ad appearing over and over, the same ads appearing every single time, or repeating
video ads.
3. Camou
age as app content, which re
ects an ad disguising itself as the type of content that
users are more likely to click by accident, such as ads masquerading as content, or ads like
news or email.
Size: This topic (size-c in Table 4.2) focused on how big (e.g., area, width, height) an ad is
in relation to the app's UI. To attract users' attention to ads, developers may be tempted to make
their ads bigger so they stand out. However, as the ad size becomes larger, it can aect the users'
ability to interact with the app. In the high level, users in their reviews complained about ad
size from two dierent perspectives: static and dynamic ad size. Most complaints were about ads
whose sizes were xed and took up much screen real estate. For example, one user commented,
\Huge ad. Ad takes up 1/4 the screen." Another user reviewed, \banner ads take up your screen
and obscure play." I also found three reviews were complaints about the increasing ad size. For
example, \ads slowly creap up the page until they've lled the entire screen." Another example
stated, \ads at the bottom started getting bigger eventually blocking the next level button."
Notication: This topic (notification-c in Table 4.2) represents the ad-related messages
shown to users through the status bar. Besides displaying relevant content, ads can attract users'
27
attention by sending an alert or notication to the status bar. However, notications can also
trigger users' complaints. A user commented that \ads show up on notication screen and icons
show up." Another user rated the app as \Ads Notication. Is too bad."
3) Non-UI-Related Ad Topics
Blocking: This topic (block-c in Table 4.2) includes ads that disabled the normal functioning
of an app due to the improper running of code. It is dierent from popup in that an ad's running
prevents the execution of non-ad-related functions and this issue cannot be resolved through the
screen GUI. This blocking aspect represents two dierent dimensions: the ad itself and its host
app. For the ad dimension, the ad cannot be fully loaded or closed. For example, \ads won't
load with just the picture and no buttons," \ads don't load so the video can't load" or \ads can't
close." For the app dimension, the ad prevents the running of the host app. For example, \with
ads my mms messages won't download," \ads freeze my phone," or \when the ad loads it stops
my download."
Paid: This topic (paid-c/p in Table 4.2) is about charging fees related to mobile ads. The
ad revenue that developers receive consists of two parts: revenue from ad networks by publishing
a \free but with ads" version of apps, and revenue from end users by providing paid apps that
charge users for ad-free functions. However, from the users' perspective, paying for mobile ads
can cause complaints.
In particular, I found three dierent costs complained about by users: 1) ads in paid apps,
including complaints such as \ads in the new version even though I paid," \ads still visible with
paid app" or \ads in subscriber version." 2) explicit payment for ad-free functions, including
complaints such as \I have to spend $.99 a month or $6.00 a year for premium to get rid of ads,"
or \I will pay for a no ad version," and 3) implicit payment for ad usage (e.g., data), including
complaints such as \ads are using my data that I have to PAY for."
I also found one user appreciated the ad-free app after the payment.
Crash: This topic (crash-c in Table 4.2) conveys that ads could make an app or its functions
crash. Sometimes ads are poorly implemented and are not compatible with their host app. The
end result is that the app or its functions keep crashing. In this case, users cannot interact with
any function of the app. Examples of user complaints are as follows: \ads now crash app," \video
ads cause the app to crash," \the ads keep crashing the video and sometimes the app," and \you
lost progress when an ad popped up."
Automated: This topic (automated-c in Table 4.2) re
ects the automated behavior related
to ads. In the mobile advertising ecosystem, ads inherit permissions from the host app. This
allows ads to do functions in an automated manner if the host app is granted permission. Such
28
behaviors can be considered malicious or extremely annoying. Once noticed, these ads incite
complaints from users. In particular, there were four kinds of such behaviors complained about
by end users:
1. Ads changed/xed system congurations, which includes the reviews such as \ads playing
at full volume even when the system volume is o," and \ads giving obnoxious music that
you cann't mute."
2. Ads were opened without click, which includes the reviews such as \ad always showing up
even i don't click them," and \ads opening without being touched."
3. Ads opened external apps, which includes the reviews such as \ads automatically open your
web browser" and \ads automatically open the play store ever 2 seconds."
4. Ads downloaded les, which prompted reviews such as \ads keep auto download."
Privacy: This topic (privacy-c in Table 4.2) is about the sensitive information accessed
by mobile ads. The host-parasite relationship allows ads to access sensitive information on the
smartphone if the host app is granted the permission. However, users may be upset to learn that
in-app ads obtain these permissions as well. Examples include: \ads and permissions granted are
wrong," and \ads in your app are used to spy on us."
Slow: This topic (slow-c in Table 4.2) means that ads slow down the app's operation. The
inclusion of mobile ads can slow the functionality of the app. The running of ads requires system
resources, such as CPU and memory, which are limited on a mobile device. As a result, fewer
resources can be allocated for the running of the host app, and this slows down the app's execution.
One example user complained that \Ads opens faster than your freakin website."
Battery: This topic (battery-c in Table 4.2) is about the ads' energy consumption. Mobile
in-app ads consume extra resources on a system, such as energy. Components, such as display and
network, are two of the most energy-consuming components on a mobile device [64, 66, 67, 108].
These two components also serve an important role in the mobile ad ecosystem since they are
used to retrieve and display ads. Even though energy consuming behavior is not directly linked to
the app or the ads it contains, mobile ads routinely consume a signicant amount of energy and
extreme levels of resource consumption may trigger user complaints [44]. There were two such
complaints on this topic such as: \the ad resource overload is unacceptable."
4) Additional Discussion
From Table 4.2, we can see that some ad aspects received both complaints and positive com-
ments from end users. In other words, on the one hand these aspects can negatively impact an
29
automated-c
battery-c
block-c
crash-c
paid-c
privacy-c
slow-c
content-c
frequency-c
location-c
notification-c
popup-c
size-c
time-c
video-c
paid-p
frequency-p
location-p
popup-p
non-descriptive-c
non-descriptive-n
non-descriptive-p
others-c
others-n
others-p
R1
R2
R3
R4
R5
Star ratings
List of ad topics
0
20
40
60
Change in #
ad reviews
Figure 4.2: Heat map of the coding-result breakdown of 400 ad reviews in the manual study: each
row represents the frequency of ad reviews with the rating from one to ve stars, and the color
intensity in each cell represents the number of ad reviews (i.e., the deeper the color, the larger
the number of ad reviews)
30
Table 4.3: Comparison between complaint and positive-comment topics with regard to the number
of occurrence, the mean and median rating)
Aspect
Complaint Positive Comment
# Mean Median # Mean Median
frequency 123 1.8 1 5 4.8 5
location 27 1.9 2 1 4 4
popup 51 1.5 1 1 5 5
paid 20 1.5 1 1 5 5
non-descriptive 129 2.2 2 12 4 4
others 3 2 2 2 4.5 4.5
app's success, on the other hand they can also be improved by developers in their app to promote
the app's success. There is a tradeo for developers when implementing mobile ads, since certain
ad aspects such as frequent ads could lead to complaints, and too few ads could decrease ad
revenue.
Another observation is that some ad aspects only have discrete metrics (e.g., format). It would
be interesting to analyze whether certain metrics (e.g., interstitial ads) are more likely to receive
negative reviews than the other metrics (e.g., banner ads).
All the above analyses cannot be resolved in this work. In fact, I analyzed the concrete
information in Chapter 5 about the aspect threshold or discrete metric that is likely to lead to
negative ratings based on ad-related topics extracted from user reviews.
I also noticed that complaint reviews were likely to have a lower rating than positive-comment
reviews, as shown in Figure 4.2, the heat map with a breakdown of all topics, where the X
axis corresponds to the star rating, the Y axis corresponds to dierent ad topics, and each cell
corresponds to the number of ad reviews. The cells with deeper colors represent a higher frequency
of the topic.
I then collected the aspects that had both complaint and positive-comment topics. Table 4.3
shows the comparison. We can see complaint reviews typically have lower ratings than positive
ones. To validate this nding, I compared the rating distribution between complaint and positive-
comment reviews based on coding results for each of the aspects. To do it, I calculated the
statistical signicance of the result using the one-tailed Mann Whitney U (MWU) test. I used
this test because it does not assume that the measured data has a normal distribution. The
output of the MWU test is a p-value. In my case, I used 0.05 as the empirical p-value, below
which I conclude that the dierence between the distribution of ratings for each set of reviews is
statistically signicant.
After applying MWU test, I found a statistically signicant dierence in four of the six
aspects in Table 4.3. They were: frequency (3.12E-05), popup (0.025), paid (0.038), and
non-descriptive (0.016), where each aspect is followed by its p-value. When comparing all
31
complaint topics as a single set with all positive topics as the other set, I found the statistical sig-
nicance with the p-value 1.95E-11. Moreover, for those descriptive ad aspects (e.g., frequency
to popup in Table 4.3), I also found the statistical signicance with the p-value 1.11E-08. In other
words, all above results showed complaint reviews were statistically signicantly rated lower than
reviews with positive comments.
The above statistical results show that these complained-about aspects could negatively in-
uence an app's rating, while positively commented aspects could positively promote the app's
rating. This further motivated me to measure dierent ad aspects and understand their relation-
ship with the apps' rating to determine if more concrete guidance can be provided to developers
as to what kinds of ad usage patterns are most likely to be viewed negatively by end users (see
Chapter 5). For example, how many ads are \too many?"
4.3 Threats to Validity
External Validity: The analysis in my study was based on reviews for only Android apps.
However, I expect that the user experience of dierent versions of the app in most cases is com-
parable and the dierences for end users are likely minor with respect to ads. In fact, developers
implement several versions of an app that can be published on dierent platforms. These versions
share the same or similar functionality. Hence I argue that mobile in-app ads impact the user
experience similarly across all platforms.
Internal Validity: To conrm if the problems associated with ad topics existed in the
corresponding apps, I installed and manually interacted with the apps corresponding to each of
the 400 ad-related reviews. In total I had 377 apps successfully installed for all the reviews. The
apps that I interacted with were those for which I had access to the app's primary functions as
real users and could interact with long enough to ensure several ad reload cycles. I registered
as a new user, if needed, before entering the main page. Once the ad aspects reviewed by end
users were conrmed in the app, I terminated the interaction. To ensure the ad functionality
was fully loaded, each app was interacted with for at least ve minutes unless the ad topic was
conrmed before that. The mobile device I used was a Samsung Galaxy SII smartphone with a
rooted Android 4.3 operating system that was compatible with the original version of most APK
les. For each app, I rst restored the system environment to its original state, then I installed
the app on the mobile device. Before beginning the manual interaction, I allowed the system to
sleep for 20 seconds to ensure that the initial page had completely loaded and displayed. Then
I began the manual interaction. I focused on the UI-related features since they are more robust
32
against the outside interference as compared to non-UI aspects. My results showed that over
80% of ad topics were conrmed for those apps that had ads displayed during the interaction. In
other words, the aspects associated with most ad topics in the reviews could be conrmed in the
corresponding apps. This further validates the conclusions of my analysis in Section 4.2. Note
that for those that could not be conrmed, I found the complaints were about the functionality
not working properly. For example, \Ads doesn't allow to see the bottom line." This was probably
due to the fact that the specic device or OS that the users used was not fully compatible with
the corresponding apps.
4.4 Conclusions
Currently, millions of smartphone users download free apps from app stores and developers receive
ad revenue by placing ads in these apps. In fact, ad revenue has become one of the most important
sources for software developers to compensate for the cost of an app's development. As discussed
in this chapter, I carried out experiments on a valid dataset of mobile advertising to investigate
what kind of ad aspects are complained about or positively commented the most. I found that
users complain about ad visual aspects more often than the non-visual aspects. Intuitively, more
exposure of mobile ads to end users helps improve the chance of ad impressions/clicks that increase
ad revenue. But improper exposure is detrimental to the user experience of an app, which in turn
negatively impacts the ad revenue developers receive. App developers thus should carefully make
a trade o to maximize their ad revenue. In fact, I found users appreciated good aspects (e.g.,
frequency, location) of mobile ads in the reviews.
Based on my study in this chapter, I suggest that when developers design ad UI during the
implementation, it will benet them if they accommodate the following three criteria:
1. ad display: some pages like the game page may not be user friendly to display ads. The
high frequency of ads among dierent pages is likely to distract the user's attention, and
thus result in end user complaints;
2. ad format: improper time to display ads with the full-screen size (especially popup ads)
could cause a negative user experience;
3. visual layout: displaying ads in visually obstructive locations (e.g., the middle of the screen
or close to clickable buttons) could interfere with the user's interaction with the app.
When embedding ads into apps, developers should also pay attention to ad non-UI functions.
In particular, it is not a good idea to display ads in the so-called paid version of an app, since
33
this has a direct con
ict with the expectations of users. Blocking app-level functionality to focus
attention on ads is another design decision that causes complaints from end users. Furthermore,
implementation decisions or a lack of adequate testing that leads to the app crashing or slowing
down the app's running could also negatively impact the user experience with the app.
My work also shows ad complaint reviews have a statistically signicantly lower rating than ad
positive reviews, which suggests multiple areas for future work. In particular, I plan to correlate
more specic ad aspects to app ratings so as to understand their relationship and identify best
practices with respect to mobile ads. I would also like to carry out controlled experiments and
surveys that allow developers to determine the impact of their ad-related choices on user ratings.
34
Chapter 5
Quantication and Analysis of UI-related Ad Metrics
In the time span of just a few years, mobile ads have grown to become an important and om-
nipresent part of the mobile app ecosystem. Recent studies show that over half of all apps contain
ads [91], and these ads collectively generate an enormous amount of revenue for mobile app de-
velopers. As a case in point, a short decade ago, total mobile ad spending was just 320 million
dollars. By 2014, the mobile advertising industry's revenue topped 49.5 billion dollars [88], and,
by 2018, analysts expect that mobile ads will drive 75% of all digital ad spending and achieve a
20% increase from 2017 to over 70 billion dollars [28].
Mobile ads represent a seemingly straightforward way for developers to monetize their de-
velopment eorts. Developers place ads in their apps and then receive a small payment when
these ads are viewed or clicked. However, there are hidden costs to deploying ads: over 70% of
users nd ads \annoying" [22], and excessive resource usage incurred by ads can be a source of
user complaints and bring down the overall rating of an app [44]. In the hyper-competitive app
marketplace, even a small change in an app's rating can cause it to disappear from the rst page
of search results, which can signicantly reduce the app's chance of success [59].
For mobile app developers, the usage of ads represents a dicult balancing act. On the one
hand, developers seek to maximize revenue by displaying ads wherever possible. On the other
hand, developers must be careful to not use ads in a way that will signicantly degrade the user
experience, as this can reduce an app's chances for success. Mobile Ad Networks (MANs) and app
developer blogs have attempted to provide guidance to developers on this topic [39, 10, 19, 78, 111].
However, this advice is generic in nature, for example, warning developers to avoid \too many"
ads or to no overlap ads with other elements. Although helpful to some extent, this advice does
not provide developers with guidance as to what could constitute too many ads, for example, or
what size or positions of ads are likely to be found more annoying than others. Even when specic
topics of common ad complaints are known (e.g., see Chapter 4), developers lack information about
35
Table 5.1: Identied ad-related UI complaint topics and their ratios and example reviews
Topic Ratio (%) Example review from the dataset
Ad format (RQ1) 19 `...I'm in the middle of playing and ads pop up...'
Frequency (RQ2) 45 `...They've added soooooo many ads...'
Size (RQ3) 9 `...Ads with slots covering the screen...'
Position (RQ4) 19 `...I can't read the quotes with the ads in my way...'
Landing page (RQ5) 11 `...As soon as I opened it, bang! Hit with an Amazon ad...'
Content (RQ6) 5 `...i get spammed by the same video ad over and over...'
whether these complained about aspects could actually in
uence their app's rating. Therefore,
developers have to guess at user preferences or must react to complaints that appear in reviews,
a point in time that is too late to prevent a drop in the app's ratings.
To address these limitations of ad-related guidance, I conducted a systematic analysis of ad
usage in mobile apps. I rst developed dierent kinds of program analyses to measure aspects of
ad usage related to common topics of ad-related complaints in mobile app reviews. I then used
statistical analyses of these measurements and the apps' rating to determine if more concrete
guidance could be provided to developers as to what kinds of ad usage patterns were most likely
to be viewed negatively by end users. For example, how many ads are \too many?" I also
developed metrics to quantify the impact of certain kinds of ad usages, which gave me insight
into the relative importance of the dierent ad usage patterns. The results of my investigation
were illuminating and I was able to identify statistically signicant dierences in the ratings of
apps that had dierent kinds of ad usages. Many of my ndings suggested specic guidelines that
could be provided to developers to help minimize the likelihood of ad-related complaints and to
recognize the types of usages that are generally associated with lower ratings. To the best of my
knowledge, my analysis is the rst eort to attempt to generate such guidelines based on empirical
analysis of reviews and apps and can help developers to improve their understanding of how to
better use ads in their apps.
The rest of this chapter is organized as follows. In Section 5.1, I describe results from Chapter 4
that identied common UI-related ad topics and how this motivated the selection of the research
questions I pursued in my investigation. Then, in Section 5.2, I explain the dataset (apps) that I
used in my case study and the design of my experiments' framework. I present details and results
of the experiments for each research question in Section 5.3, and discuss threats to validity in
Section 5.4. Finally, in Section 5.5, I summarize my ndings.
36
5.1 Motivation for the Research Questions
To identify dierent ad usage patterns for my investigation, I used my previous work [45] in
Chapter 4 that examined the topic and frequency of UI-related ad complaints in mobile app
reviews. In that study I manually analyzed a statistically valid sample of 400 ad-related reviews
and classied the complaints into 16 categories. My study found that ads that visually interfered
with an app's UI (e.g., ad position or ad frequency) were the most likely to trigger complaints.
The list of topics and their frequency generated in Chapter 4 formed an initial pool of po-
tential ad usage patterns to investigate in this work. I focused on ad complaint topics that were
amenable to automated large-scale analysis in that their presence and values could be automat-
ically measured at a large scale. The six topics that I chose for my investigation were: ad type
(i.e., banner or interstitial); frequency (how often ads appear in an app); location and size of the
ads with respect to an app's UI; landing page (whether ads appear on the initial loaded screen of
an app); and content (whether the same ads are repeatedly shown to users). Together, these six
topics represent just over 84% of the UI-related ad complaint topics found in Chapter 4. Table 5.1
shows the relative frequency of each of these topics in my previous study and example reviews
from the dataset. Note that the sum of these ratios is over 100%, since some reviews contained
more than one complaint. For each of these topics I formulated a research question (RQ) aimed
at identifying if there was a relationship between the topic and user provided ratings. Below, I
present these RQs in more depth.
RQ 1: What is the relationship between ad format and app ratings?
In general, there are two types of ads that can be included in an app by a developer. Ads
that occupy the full screen are called interstitial ads and require the user to view the ad for a
time interval or click a close button to return to the app. Users often call these \popup" ads,
since they can appear in the middle of the screen at runtime. Ads that occupy less than a full
screen are generally referred to as banner ads. These typically show up as horizontal boxes that
span the screen's width, but only occupy a portion of the vertical space available on the screen.
They allow a user to interact with the other visible parts of the app's UI while the ad is being
displayed. The decision of which type of ad to utilize is challenging for developers. On the one
hand, interstitial ads generally have a higher payout than banner ads [14] but may more directly
interfere with an end user's interactions while they are displaying. On the other hand, banner ads
are comparatively less intrusive but are continuously present in a UI, potentially becoming more
annoying to end users. In answering this RQ, I investigate whether the usage of one type of ad
versus the other can lead to dierences in the ratings given to an app.
37
RQ 2: What is the relationship between ad frequency and app ratings?
MANs pay developers based on the number of ad clicks or ad impressions (views) they have in
their app. Developers are therefore incentivized to do things that might encourage these end user
ad interactions, such as displaying more ads. However, too many ads can negatively impact an end
user's experience and lead to lower ratings for the app. Therefore, it is important for developers
to have concrete information about what frequency threshold is likely to lead to negative ratings.
In this RQ, I investigate whether apps that make use of ads more frequently are likely to have
lower ratings.
RQ 3: What is the relationship between banner ad size and app ratings?
To encourage user interactions with ads, developers may also be tempted to make their ads
larger, so as to attract user attention and possibly promote more clicks. However, as the ad size
becomes larger, it may aect or interfere with a user's ability to interact with an app's UI. For
developers, there is no guidance as to what size ads are more likely to be correlated with lower
app ratings or if larger ads may negatively impact ratings. Therefore, in this research question,
I study several ad size metrics (i.e., height, width and area), to determine if larger ads are more
likely to have lower ratings.
RQ 4: What is the relationship between banner ad position and app ratings?
Developers may place ads anywhere in the UIs of their app. For example, they may use
standard Android UI containers to show ads that are images or WebView components to position
HTML based ads. Odd positions may increase the attention brought to the ad, which could lead
to more clicks. However, ads in certain positions, such as the middle of the page may be disruptive
to the usability of the app. For instance, if an ad is displayed in the middle of the screen, other
elements can only occupy the upper half and lower half of the page. Ads placed at the top or
bottom of the UI may overlap with other UI elements. All of these placements could potentially
cause problems for users and lead to lower app ratings. In this RQ, I investigate whether certain
ad positions are more likely to lead to dierences in app ratings.
RQ 5: What is the relationship between ads in the landing page and app ratings?
For developers, placing ads on the landing page of the app may appear to be a sure-re way
to increase the likelihood that a user will click or interact with the ad. However, it is unclear
what kind of reaction users have to this type of ad placement. On the one hand, they may nd
such a short display to be inconsequential to their user experience, or they may more negatively
rate apps that use this form of advertisement. In this RQ, I investigate whether the use of ads on
landing pages is likely to lead to dierences in app ratings.
RQ 6: What is the relationship between repeated ad content and app ratings?
38
Repeated ad content can help to engage users with ads by increasing the likelihood that
users see and/or interact with ads. However, repeated ad content is also a common topic for ad
complaints. In this RQ, I examine whether repeated ad content can actually lead to a dierence
in app ratings.
5.2 Methodology
To address the RQs dened in the previous section, I designed a large-scale study to determine
if certain ad usage patterns had a relationship with the containing app's rating. In this section,
I describe the parts of the experiment protocol that are common to all of the RQs. At a high
level, my general methodology was as follows. I began by selecting a subset of apps from the
Google Play app store (Section 5.2.1). Then, I ran a generated workload on each of the subject
apps (Section 5.2.2). While each app was running, I collected periodic screenshots and detailed
information about the layout of the app's UI (Section 5.2.3). After doing this for all apps, I
performed an analysis of the collected information to identify the ad-related metrics relevant
for each of the RQs (details specic to each RQ are presented in Section 5.3, but summarized in
Section 5.2.4). Finally, I statistically analyzed the collected metrics to determine their relationship
with user ratings (Section 5.2.5).
5.2.1 Subject App Selection
I obtained my subjects from a collection of apps downloaded from the Google Play mobile app store
over a 20 month period. I built this collection by crawling the Google Play store and downloading
the top 400 ranked apps (as ranked by Distimo [5]) in 27 of the 36 ocially recognized app
categories and their associated meta-data, such as ratings, app versions, and user reviews. Overall,
I downloaded 10,750 apps, excluding only categories for which it was not possible to generate
automated deterministic workloads (e.g., games). I identied a randomly sampled subset of this
collection that (1) invoked an ad network service to ensure there would be ad-related behaviors
to measure, (2) had valid ad prole IDs to ensure that the app would retrieve ads when it was
run, (3) spanned dierent categories to improve generalizability of the results, and (4) could be
run using PUMA [50], a tool to automatically interact with mobile apps and record UI-related
information, to ensure that the analysis could be automated. My nal subject pool contained 773
apps that met this criteria and had 12,866,120 user reviews associated with those apps.
39
5.2.2 Workload Generation
For each of the apps, I generated a workload so that I could collect information about how the
app's ads were used and displayed at runtime. To generate the workloads, I used PUMA, which is
a mobile app UI crawler framework built on top of the Monkey tool from Google. PUMA crawls
through a mobile app's UI by triggering events and clicking on elements in the UI to transition
to new UIs and activities. Previous studies [15, 50] have shown that PUMA is generally able to
achieve high activity coverage (over 90%) for apps that it successfully interacts with, meaning
that most of the dierent UIs are visited. This is especially important for my study, as most ads
are displayed and congured on a per-activity basis, so using the PUMA approach gave us high
condence that we would see most of the dierent ad-related behaviors in the app.
During a crawl, PUMA ensures that all interactions remain within the app (e.g., if a link to
the external app or link is followed, PUMA returns to the original app). I congured PUMA to
interact with each app for four minutes. I chose this amount of time because even the slowest
possible ad refresh rate is less than two minutes (e.g., 30-120 seconds for AdMob), and four
minutes allowed for several ad refresh cycles. Whenever a new page (i.e., activity) was visited, the
ad in the page was typically refreshed. Therefore even with the slowest refresh setting, several sets
of ad screenshots and UI layouts could be collected. I extended PUMA so it could more eectively
navigate through apps that contained ads. For example, it could navigate past interstitial ads,
which block out an app's other UI elements and freeze the screen for an amount of time until a
close button is clicked. To do this I implemented heuristics to guide PUMA to search for a close
button and click on it before carrying out any further interaction with interstitial ads. In some
cases my extensions could not always navigate through such scenarios successfully. I could detect
this situation by analyzing the collected data, as described in Section 5.2.3, which allowed me
to discard the app's data from the collected metrics. All workloads were executed on a Google
Nexus 5 smartphone running Android 5.1, and with a screen resolution of 1776 1080 pixels.
5.2.3 Data Collection
During an app's execution, I gathered two types of information: screenshots and UI layouts. To do
this, I modied PUMA so that a separate thread ran the UIAutomator (UIAT) [40], which is part
of the Android Testing Support Library. Every 0.35 seconds the UIAT would take a screenshot
of the app's UI and generate an XML layout le that described the structure of the current UI
in terms of its components' location and size. The result of this data collection was a sequence of
layout le/screenshot pairs obtained from running each app's workload.
40
I then implemented automated scripts that analyzed the pairs and removed those that rep-
resented when PUMA had (1) gotten \stuck" on a UI and was unable to advance to another
screen (i.e., the screenshots were identical until the end of the execution) or (2) taken a snapshot
while the app's UIs was in transition (i.e., a completely black screen). For each pair, the scripts
then identied the ads by analyzing the layout le and identifying all WebView components that
matched the size constraints of either a banner or interstitial ad. Due to the unique size attributes
of the ads and the fact that the underlying ad networks in my subject apps almost exclusively
dened ads in WebView components, it was possible to accurately identify ads with this heuristic.
A manual examination of 1,686 randomly selected ad identications showed 94% accuracy. The
extracted ads, layout les, and screenshots were then analyzed to extract the relevant metrics for
each RQ, as described in Section 5.3. A particular advantage of my data collection mechanism is
that it worked equally well for apps with and without ad library obfuscation, a mechanism that
has limited app ad analysis in prior work (e.g., [77, 100]).
5.2.4 Analysis to Identify Ad Usage
The goal of the analysis step is to separate the subject apps into two sets, one that exhibits
the ad usage pattern and the other one that does not. There are two seemingly straightforward
approaches to do this based on the contents of the ad reviews; however, both approaches have
problems that would introduce signicant threats to validity. The rst approach would be to
create one group that contains only apps that have reviews complaining about the ad aspect and
the second containing only apps that comment favorably on the ad aspect. This is unfortunately
unrealistic to do because reviewers rarely, if at all, comment favorably on ads or, broadly, on
aspects of an app that do not cause problems. In fact, reviews that explicitly mention ads are
overwhelmingly negative (see Chapter 4). Building on this, a seemingly viable second approach
would be to group apps that have a complaint mentioning the pattern into one set and then group
the rest into the other set. There are two issues with this approach. First, the apps in the latter
set could still have apps with the ad usage pattern, if the pattern is not mentioned in any review.
Second, splitting the apps into two groups, where I knowingly have only apps with complaints
about the pattern in one group, would inherently bias the results against the pattern by leaving
out apps that may have the pattern and have no complaints.
Therefore, instead of depending on the reviews, I designed and developed analyses to auto-
matically identify the pattern by looking at the data collected for each subject app during its
workload execution. My analyses used this data to divide the apps into two groups, S
1
and S
2
,
where one group represented apps with the aspect that I wanted to evaluate and the other group
41
did not. For example, in RQ1 I compared apps that had banner ads (S
1
) against apps that had
interstitial ads (S
2
). The specic analyses varied by RQs, and are explained in more detail in
Section 5.3, but were generally based on analysis of the code, layout XML le, and screenshots.
For ad patterns where the metric was continuous (e.g., size), I added an additional step to sort
apps into either S
1
or S
2
. For these RQs, I ran a K-means clustering (with K = 2) on the apps
based on the relevant metric for the RQ and used the two resulting clusters as the two S sets.
5.2.5 Statistical Analysis
For each RQ, I performed a statistical analysis to determine if there was a dierence in ratings
between the two sets, S
1
and S
2
. Note that a user's rating is an integer value between one and
ve that is assigned to an app by a user. An \app's rating," as reported by an app store, is based
on the average score of these user ratings. For each RQ, I compared the two sets of apps based
on the distribution of their app ratings. Note that all of the app ratings used in this calculation
corresponded to the version of the app that was analyzed in the study.
For each comparison of the rating distribution, I calculated the statistical signicance of the
result using the one-tailed Mann Whitney U (MWU) test. I used this test because it does not
assume that the measured data has a normal distribution. The output of the MWU test is a
p-value. In my case, when the p-value is less than 0.05, I can conclude that the dierence between
the distribution of the evaluation metric for each set of apps is statistically signicant. However,
when the p-value is greater than 0.05, this means that I do not have sucient evidence to support
a claim either way. I use the standard p-value of 0.05 without modication (e.g., Bonferroni)
because, although my hypotheses use the same set of subject apps, the actual set of apps that
comprise each S
1
and S
2
are dierent for each RQ; therefore, my hypotheses do not represent a
family of hypotheses that may require p-value corrections.
Once I determined that a dierence in rating was statistically signicant, I was also interested
in quantifying the impact of the dierence in rating. Traditional eect size analysis (e.g., Cohen's
d eect size [21]) would not be informative in this situation since all the apps under consideration
have a rating between 4 and 5 stars, i.e., a very narrow margin. Furthermore, I wanted to
quantify this impact in a way that would be meaningful for app developers. Therefore, I used
the change in app ranking represented by the dierence in median ratings between the two sets,
S
1
and S
2
as a measure of impact. This measure corresponds to the dierence an app developer
could expect to see if an app either exhibited or did not exhibit the ad-related aspect. Then for
each RQ, I used the distribution of the ratings to compute the corresponding expected increase
or decrease in percentile ranking an ad-related aspect could have on an app. Figure 5.1 shows
42
0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50
0
0.2
0.4
0.6
0.8
1
1.2
App rating
CDF
S1
S2
CFD (percentage)
App rating
Figure 5.1: Cumulative frequency distribution (CFD) of star ratings among 10,750 apps: the X
axis represents the app rating from one to ve stars, the Y axis represents the CFD distribution
in terms of percentage, and two example points (in pink and light blue color) are demonstrated
on the distribution curve
the cumulative frequency distribution (CFD) of app ratings for all of the subject apps (i.e., the
set of 10,750 apps). In the gure, the X axis represents the app rating, and the Y axis the
cumulative frequency. Specically, for a rating r on the X axis, the corresponding CFD on the Y
axis represents the percentage by which the observed app ratingsR take a value less than or equal
to r. In other words, this frequency represents the rank of an app that has the rating r. Assume
there were two apps (w.r.t. S
1
and S
2
in the gure) whose ratings were 3.0 and 4.0, respectively.
Their cumulative frequency on the Y axis were 0.2 and 0.6, respectively. This indicates, among
the whole dataset of 10,750 apps, there were about 4,000 apps with ratings between these two
apps (i.e., a 4,000 change in ranking).
5.3 Results
In this section, I present the results for each of the six RQs. For each RQ, I rst begin by
describing the analysis I designed to identify and/or quantify the ad usage pattern. For example,
in RQ1, I describe the analysis I designed to determine if an app displays banner or interstitial
ads. Then I present the results of the statistical analysis, which comprises of the MWU test for
determining statistical signicance; relevant statistical information, such as the median of the
average star rating for each set of apps; and the potential impact of the dierence in median
ratings. In Section 5.4, I discuss additional analyses I performed over the results to identify or
43
rule out threats to the validity of these results. I compare my results in Section 5.5 with guidelines
issued by AdMob, one of the largest and most popular Android MANs.
5.3.1 RQ1: What is the relationship between ad format and app ratings?
Approach: I developed an analysis to divide the subject apps into two categories based on
their usage of the two ad formats, apps with only banner ads and apps with interstitial ads.
To do this, my analysis identied the UI widget container containing ads and determined its
size. If an ad matched one of the banner sizes allowed by the MAN, the ad was counted as a
banner ad. For example, Google AdMob allows developers to specify one of several dierent sizes,
such as FULL BANNER, LARGE BANNER, and SMART BANNER, and the sizes of these are
either xed or can be calculated as a ratio of the device's screensize. If the size of an ad was
approximately the size of the full screen, the ad was idented as an interstitial ad. For the
category with interstitial ads, I classied apps that contained only interstitial ads or both types
of ads since the goal of my analysis was to determine if there was any dierence in ratings due to
the use (or presence) of an interstitial ad.
Results: In this question I examined the following hypothesis {H
1
: Do apps with only banner ads
have higher ratings than those with interstitial ads? The set with only banner ads was comprised
of 620 apps and had a median rating of 4.20 stars. The set with interstitial ads was comprised of
153 apps and had a median rating of 4.16 stars. The dierence was 0.04 stars and was statistically
signicant with the p-value 0.044. This dierence corresponds to a 3-percentile change in rating
distribution and a 323 change in ranking.
Summary: There is sucient statistical evidence to support the nding that apps with only
banner ads are more likely to have higher ratings than apps that contain an interstitial ad.
Discussion: From user reviews about ad format, most were complaints about ads popping up
frequently or lasting for a long period of time. For example, one user complained, \Ads pop up
every 2 seconds. Covered with pop up ads." Another complained \Ad heavy and
aky, 30 second
ads are a bit much..." As described in Section 2.2, interstitial ads are displayed as pop-up ads,
and in many cases are video ads. Since they have the full-screen size, users typically need to
interact with these ads before accessing app content. In contrast, banners ads are not under
such circumstances. Hence, one possible reason could be that interstitial ads block users' normal
interaction with apps due to their inappropriate frequency and time.
44
5.3.2 RQ2: What is the relationship between ad frequency and app
ratings?
Approach: To address this research question, I designed analyses to collect two dierent metrics
from the subject apps. The rst metric was the ratio of the number of an app's pages that contain
ads versus the total number of pages in the app. The second metric is the number of ads displayed
simultaneously in each page of an app. To evaluate the rst metric, I used the K-means clustering
to divide these apps into two subsets: apps with a higher ratio and apps with a lower ratio. For
the second metric, my analysis calculated if there was one or two ads displayed simultaneously
in a screenshot and grouped each app based on this number. Note that never more than two ads
were displayed simultaneously in any app's set of screenshots.
Results: The two hypotheses that I examined in this question were {H
2
1: Do apps with a higher
ratio of ads have lower ratings? andH
2
2: Do apps with more ads in a screenshot of the app have
lower ratings? I evaluated both hypotheses and found the following results:
1. Ad ratio: Cluster 1 represented the set of apps with a higher ratio. There were 407 apps
in this cluster and the median rating was 4.17 stars. For the cluster, the mean ratio was
0.77, the median 0.77, and the standard deviation 0.15. Cluster 2 represented the set of
apps with a lower ratio. There were 366 apps in this cluster and the median rating was 4.20
stars. For the cluster, the mean ratio was 0.25, the median 0.25, and the standard deviation
0.14. The dierence was 0.03 stars and was statistically signicant with the p-value 0.014.
This dierence corresponds to a 4-percentile change in rating distribution and a 430 change
in ranking.
2. Number of ads per page: The set with only one ad displayed per page was comprised of 737
apps and had a median rating of 4.19 stars. The set with two ads displayed in the same
page at the same time was comprised of 36 apps and had a median rating of 4.06 stars.
The dierence was 0.13 stars and was statistically signicant with the p-value 0.003. This
dierence corresponds to a 16-percentile change in rating distribution and a 1,720 change
in ranking.
Summary: My ndings indicate that apps that show ads more frequently, whether overall across
activities or per activity, are generally rated lower by users.
Discussion: To analyze users' attitude about ad frequency, I manually examined corresponding
user reviews. I found most were pure complaints about ads being displayed frequently. For
example, one user complained, \Ads everywhere. After each click you get a new ad..." Another
45
complained, \Too many ads. Ads popped up too often for decent game play..." Among these
complaints, no further ndings could be extracted except that people just did not like too many
ads. However, it would be interesting to study how many ads are too many to end users.
5.3.3 RQ3: What is the relationship between banner ad size and app
ratings?
Approach: To answer this research question, I developed analyses to collect size metrics for the
banner ads in each of the apps. These metrics were the width, height, and area of the ad rectangle
displayed on the UI interface. If there were apps that had multiple banner ads with dierent sizes,
I calculated the mean value for each of the three dimensions and used that value to characterize
the app. Then I applied K-means clustering to each of these dimensions and used the clustering
results to divide the apps into two groups per metric. Note that I used the same device, a Google
Nexus 5 smartphone, for all of the measurements, so there was no need to normalize the metrics
as a percentage of screen size. Since this RQ focused on banner ads, I did not consider apps with
only interstitial ads; therefore this RQ analyzed 725 apps.
Results: I evaluated whether apps that had larger ads in terms of width (H
3
1), height (H
3
2),
and area (H
3
3) had higher or lower ratings than apps with smaller ads. The individual results
for the three metrics are as follows:
1. Width of an ad: Cluster 1 represented the set of apps with a greater width. There were 305
apps in this cluster and the median rating was 4.17 stars. For the cluster, the mean width
was 1093p, the median 1080p, and the standard deviation 96p. Cluster 2 represented the set
of apps with a smaller width. There were 420 apps in this cluster and the median rating was
4.20 stars. For the cluster, the mean width was 957p, the median 960p, and the standard
deviation 50p. I found no statistically signicant dierence between these two clusters.
2. Height of an ad: Cluster 1 represented the set of apps with a greater height. There were
38 apps in this cluster and the median rating was 4.06 stars. For the cluster, the mean
height was 366p, the median 360p, and the standard deviation 54p. Cluster 2 represented
the set of apps with a lower height. There were 687 apps in this cluster and the median
rating was 4.20 stars. For the cluster, the mean height was 152p, the median 150p, and
the standard deviation 12p. The dierence was 0.14 stars and was statistically signicant
with the p-value 1.49e-03. This dierence corresponds to a 15-percentile change in rating
distribution and a 1,613 change in ranking.
46
3. Area of an ad: Cluster 1 represented the set of apps with a greater area. There were 36
apps in this cluster and the median rating was 4.05 stars. For the cluster, the mean area was
389090p
2
, the median 383520p
2
, and the standard deviation 54199p
2
. Cluster 2 represented
the set of apps with a smaller area. There were 689 apps in this cluster, and the median
rating was 4.20 stars. For the cluster, the mean area was 153378p
2
, the median 144000p
2
,
and the standard deviation 16637p
2
. The dierence was 0.15 stars and was statistically
signicant with the p-value 4.01e-04. This dierence corresponds to a 17-percentile change
in rating distribution and a 1,828 change in ranking.
Summary: I found no statistical evidence of a dierence in ratings for apps with respect to
width. Instead my evidence indicates that apps with large and small ads (with respect to the
width) are present in apps with a wide variety of ratings. However, I did nd that ads with a
smaller height or area are typically in more highly rated apps.
Discussion: Ads with a larger height or area compress the screen area that is usable to a user's
interaction with the app. This is more likely to be noticed by the user compared to the width
dierence of ads. Among user reviews, the height or area of ads were the ones mostly complained
about regarding banner ad size. For example, one user complained: \Ads taking up entire screen.
Happens intermittently but the bottom banner ad keeps expanding so you can't see any of the map."
Hence, it is worthwhile to do a group of experiments to investigate users' acceptance of mobile
ads with a dierent height, area or width.
5.3.4 RQ4: What is the relationship between banner ad position and
app ratings?
Approach: My goal in this research question was to evaluate the relationship of ad position on
app ratings. For this RQ I only considered apps with banner ads, since interstitial ads occupy the
entire screen and, by denition, their position cannot be altered. For each UI, I considered three
potential ad positions: top, middle, and bottom of the screen. I developed an analysis to divide
the apps with banner ads into three groups, each corresponding to one of the three ad positions.
To determine the relative position of the ad, the analysis compared the x-y coordinates of each
ad-containing view object with the device's screen size dimensions. To simplify the analysis, I did
not include apps in this RQ if they had ads in multiple positions. For example, if an app had ads
in both the top and middle of the screen, the app was not included in any of the three groups.
There were 115 such apps.
47
Results: I analyzed the resulting sets to determine if apps with banner ads in the top, bottom
or middle of the screen had higher or lower ratings. I compared the ratings of apps with ads in
the top to apps with ads in the bottom (H
4
1), apps with ads in the top to apps with ads in the
middle (H
4
2), and nally apps with ads in the middle to apps with ads in the bottom (H
4
3). I
present the results for each comparison below.
The set S
1
with ads in the top was comprised of 114 apps and had a median rating of 4.31
stars. The set S
2
with ads in the bottom was comprised of 380 apps and had a median rating of
4.19 stars. The setS
3
with ads in the middle was comprised of 114 apps and had a median rating
of 4.15 stars. I obtained statistical signicance for both H
4
1 and H
4
2. The dierence between S
1
and S
2
was 0.12 stars and was statistically signicant with the p-value 6.43e-06. This dierence
corresponds to a 15-percentile change in rating distribution and a 1,613 change in ranking. The
dierence between S
1
and S
3
was 0.16 stars and was statistically signicant with the p-value
6.86e-06. This dierence corresponds to a 19-percentile change in rating distribution and a 2,043
change in ranking. However, I did not nd any statistical dierence between S
2
and S
3
.
Summary: Apps with ads in the top of the UI are typically rated higher than apps with ads
that appear in either the bottom or middle of the UI.
Discussion: From user reviews about ad placement, most were complaints about ads overlapping
or being close to app functional area. For example, one user complained, \Ad overlaps control
buttons." Hence, one possible reason could be that both bottom and middle ads are close to app
functional area that users commonly click on. In particular, the device menu buttons are typically
beneath the bottom of the display screen, while key app functions or content are in the center of
the screen. Both locations are more likely to be interacted with and hence noticed by end users.
The potential reasons ads are close to clickable elements merits further study.
5.3.5 RQ5: What is the relationship between ads in the landing page
and app ratings?
Approach: To address this research question, I developed an analysis to identify apps whose
landing page displayed a banner or interstitial ad right after launch. To do this, the analysis
examined the rst UI layout and screenshot in the timeline for each of the apps and determined
whether there was an ad-related view object displayed.
Results: The hypothesis I tested was {H
5
: Apps with ads in the landing page have a lower rating
than apps that do not. The set with ads in the landing page was comprised of 271 apps and had a
median rating of 4.10 stars. The set that had ads but not in the landing page was comprised of 502
48
apps and had a median rating of 4.21 stars. The dierence was 0.11 stars and was statistically
signicant with the p-value 1.08e-05. This dierence corresponds to a 13-percentile change in
rating distribution and a 1,398 change in ranking.
Summary: I conclude that apps with ads in the landing page are more likely to have lower
ratings than apps with ads that do not appear in the landing page.
Discussion: The landing page of an app provides the users' rst impression of the app. Hence
ads on the landing page inevitably become part of this rst impression. Many users complained
that ads on the landing page were unexpected, such as \Ads are annoying... I really enjoyed it
until I recently updated it, and now the ads are the rst message seen..." It has been shown the
initial judgement by users on the rst page is an essential condition for the success of a website
[105]. In fact, it takes 50 ms for users to form aesthetic impressions of web page designs [34, 71].
It would be valuable to study whether the landing page within an app has the same impact on
the app success as seen on the web platform.
5.3.6 RQ6: What is the relationship between repeated ad content and
app ratings?
Approach: The goal of this question was to determine if there was a relationship between ratings
and the same ad being shown to the users when they are interacting with an app. Note that I
only analyzed banner ads instead of interstitial ads. This is because according to the AdMob unit
prole constraints [36], only banner ads can be set to have no refresh, i.e., to show the same ad
repeatedly. I designed an analysis to extract the ad images from each screenshot, by using the
ad position and size specied in the layout le, and then calculate a \uniqueness ratio" for the
ad image. My analysis computed this ratio by identifying all the unique ad images for a given
app and then dividing this number by the total number of ad images for the app. To determine
if an ad image was unique, the analysis used a pixel-based color comparison algorithm. For the
comparison, the analysis calculated the overall color dierence on a pixel by pixel basis for each
pair of ad images that had the same size. If the dierence ratio was over 0.1, the two ad images
were considered distinct. Next, I used the K-means clustering techniques to dene two groups of
apps: those with a greater uniqueness ratio and those with a smaller uniqueness ratio.
Results: The hypothesis analyzed in this RQ was as follows {H
6
: Apps that have a lower ratio of
unique ads have a lower rating than apps with a higher ratio of unique ads. Cluster 1 represented
the set of apps with a higher ratio. There were 401 apps in this cluster and the median rating
was 4.21 stars. For the cluster, the mean ratio was 0.79, the median 0.75, and the standard
49
deviation 0.16. Cluster 2 represented the set of apps with a lower ratio. There were 322 apps in
this cluster and the median rating was 4.16 stars. For the cluster, the mean ratio was 0.34, the
median 0.35, and the standard deviation 0.14. The dierence was 0.05 stars and was statistically
signicant with the p-value 7.79e-03. This dierence corresponds to a 6-percentile change in rating
distribution and a 645 change in ranking.
Summary: I conclude that apps with more frequently repeated ads are typically rated lower
than other apps.
Discussion: From user reviews about ad content, many were complaints about ads being repeated
during users' interaction with the app. For example, one user complained, \...KFC ad over and
over and over.... Not worth having." Another complained, \Ads in videos. on some videos the ad
will start repeating. plz x." After further investigation, I found no common pattern associated
with these ads. Like the ad frequency aspect in RQ2, users just complained about repeated
ads. However, it would be meaningful to analyze users' attitude towards the degree of repetition
(partial or complete repetition).
5.4 Threats to Validity
In this section I discuss my analysis and mitigation of the threats to validity that could exist due
to the size and scope of the empirical study.
External Validity: I designed my study to reduce several possible threats to the general-
izability of my results. To ensure my subject apps and reviews were selected without bias and
broadly representative, I selected real apps from the Google Play app store that spanned a broad
range of categories. The initial pool of 10,753 apps was independently selected, because I down-
loaded the top 400 apps (as rated by Distimo [5]) from 27 of the dierent app categories, excluding
only categories for which it was not possible to generate automated deterministic workloads, such
as games. From this set, I then automatically identied, using bytecode analysis, the subset of
773 apps that used ads. While no experimental choice can provide perfect generalizability, the
source of the apps and their category distribution helps to avoid any biases that could reduce
generalizability.
Additionally, the choice of the ad aspects that I studied in this chapter was based on my
previous analysis of reviews in Chapter 4. It is likely that there are other ad aspects and metrics
not addressed in this work that may also have a statistical relationship with app ratings. However,
not studying these aspects does not invalidate my ndings. Instead, the existence of other ad
aspects points to the possibility that additional guidelines could be identied in future research.
50
Internal Validity: The threat to internal validity could be because of the choices I made in
the implementation and execution of the experiments in my study. There are three such threats.
(1) I collected ad-related metrics based on a single execution of each app. This may appear to
introduce problems with sampling variation due to random events in the environment. However,
it is important to note that for an app, once the ad prole is congured and the app is published,
most ad-related spatial attributes are xed, such as position, size, and format. Dierent runs of
an app will result in the same set of execution information with respect to the ad-related metric.
This implies that one run for each app is enough to obtain reliable measurements.
(2) Apps could be executed on a network that was unreliable and could impact the collection
of ad metrics. In order to avoid this, I set up my mobile device so that during the workload
execution it was connected to a local WIFI router that was itself linked to the school network
of the University. The network conditions of the WIFI network were well maintained (34M
download, 36M upload, and 5ms latency), which minimized any potential impact of a delay of ad
transmission on the collection of ad metrics.
(3) The choice of a device might in
uence which types of ads might be shown. It is possible
that dierent devices' screen sizes may lead to dierent ad aspects. While most visual aspects
analyzed in Section 5.3 are independent of screen size, I reran the analysis with the same set of
subject apps on another device, the Samsung Galaxy SII, which has a screen resolution of 800
480 pixels and Android 4.3 installed. As in my main study, I collected ad metrics for each of
the RQs on the SII. I was able to collect ad metrics for 394 apps. The reason I could not collect
metrics for all of the apps was that they were not compatible with the second device. For example,
some apps crashed before any ad metric could be collected. I then compared the two sets of ad
metrics for each of the RQs. For those metrics that had statistical signicance in Section 5.3, over
98% of the measurements had the same results. Note that for the banner size in Section 5.3.3,
I normalized the metrics as a percentage of screen size. Dierences in ad metrics were typically
caused by asynchronous behaviors, such as ads not loading in a timely manner or popup windows
checking for updates. The above comparison indicated that the choice of device would not impact
my ndings.
Construct Validity: One threat to construct validity in my study is the following: Are
ratings a reasonable measure of an app's success? A fair question is if revenue would be a better
measure. Although I think this line of inquiry would be useful, I am limited by the lack of data
on revenue that developers get from ads. Instead, I see the app's rating and ranking as a proxy
for success, because end users are more likely to download highly ranked apps [51]. Since MANs
pay developers by the number of ad clicks/impressions [37, 38], the larger base of users means
51
Table 5.2: Comparison between individual app category and whole dataset with regard to the
number of apps, the minimum, mean, median and maximum ratings, and the CFD-percentage
change [last three columns] for three rating points (25th, 50th and 75th-percentile ratings) when
there was a 0.01 rating dierence
ID App category # apps
Rating Rating impact
Min Mean Median Max 25% 50% 75%
X0 ALL 10,750 1 3.65 3.83 5 0.27 0.55 0.82
X1 ARCADE 358 2.16 4.03 4.15 4.75 1.68 1.96 2.79
X2 BOOKS AND REFERENCE 350 1.34 3.98 4.17 4.92 0.86 1.43 0.57
X3 BRAIN 343 2.20 4.04 4.15 4.75 0.58 1.75 1.75
X4 BUSINESS 371 1.00 3.28 3.37 5.00 0.81 0.54 1.62
X5 CARDS 348 2.20 3.83 3.90 5.00 0.86 1.15 1.72
X6 CASUAL 341 2.57 3.95 4.05 4.66 0.88 1.47 1.76
X7 COMICS 281 1.42 3.73 3.94 5.00 1.07 1.07 1.42
X8 COMMUNICATION 367 1.42 3.57 3.62 4.58 1.09 1.63 1.09
X9 EDUCATION 367 1.42 3.89 4.05 4.79 1.09 1.91 1.09
X10 ENTERTAINMENT 330 1.45 3.50 3.64 4.73 0.91 1.21 1.21
X11 FINANCE 380 1.48 3.34 3.36 4.77 0.79 1.32 0.53
X12 HEALTH AND FITNESS 369 1.49 3.81 4.04 4.78 1.36 1.08 1.63
X13 LIBRARIES AND DEMO 238 1.00 3.50 3.66 4.99 1.26 1.68 1.68
X14 LIFESTYLE 355 1.45 3.61 3.74 4.83 1.13 1.41 0.56
X15 MEDIA AND VIDEO 293 1.00 3.44 3.59 4.69 1.02 0.68 0.68
X16 MEDICAL 376 1.17 3.52 3.68 4.81 1.33 0.79 0.79
X17 MUSIC AND AUDIO 337 1.66 3.71 3.86 4.78 0.89 1.19 1.48
X18 NEWS AND MAGAZINES 357 1.26 3.28 3.36 5.00 1.12 1.12 1.4
X19 PERSONALIZATION 312 2.28 4.09 4.19 4.81 1.28 1.92 1.6
X20 PHOTOGRAPHY 361 1.14 3.65 3.78 4.86 1.11 1.38 1.11
X21 PRODUCTIVITY 364 1.31 3.74 3.89 4.79 1.09 1.37 1.09
X22 RACING 331 2.48 3.83 3.91 4.70 1.81 1.81 1.81
X23 SHOPPING 328 1.29 3.39 3.53 4.69 0.61 0.61 1.52
X24 SOCIAL 324 1.33 3.42 3.48 5.00 1.23 0.62 1.23
X25 SPORTS 332 1.51 3.58 3.75 5.00 0.9 0.9 0.9
X26 SPORTS GAMES 338 2.02 3.76 3.89 4.82 1.77 1.77 2.6
X27 TOOLS 383 1.17 3.79 3.92 4.78 0.78 1.3 1.04
X28 TRANSPORTATION 371 1.00 3.31 3.37 5.00 0.81 0.81 1.34
X29 TRAVEL AND LOCAL 376 1.27 3.42 3.58 4.74 0.53 2.13 1.07
X30 WEATHER 384 1.36 3.67 3.74 5.00 0.78 1.3 1.3
more ad revenue to developers. This assumption has been widely used and supported in related
work [69, 51, 58, 59, 57, 84]. Therefore, in the absence of ad revenue data, I believe that an app's
rating and ranking are reasonable substitutes.
The second threat to construct validity is that in a single app category, the rating dierence
may not have the same impact as in above RQs on app rankings, since I did the rating im-
pact analysis based on the whole dataset across all 30 app categories. To address this threat, I
determined such impact for each of the app categories.
I rst calculated the CFD distribution in each app category. Table 5.2 shows the statistical
rating information (X1 - X30) across 30 dierent app categories, and ?? shows the corresponding
boxplot graph of ratings. The mean rating across all app categories (the column mean in the
table) has an average value of 3.65 with a standard deviation of 0.24. And the median rating (the
column median in the table) has an average value of 3.77 with a standard deviation of 0.25. The
rst row (i.e., X0) in the table represents the information for the whole dataset.
52
ARCADE
BOOKS_AND_REFERENCE
BRAIN
BUSINESS
CARDS
CASUAL
COMICS
COMMUNICATION
EDUCATION
ENTERTAINMENT
FINANCE
HEALTH_AND_FITNESS
LIBRARIES_AND_DEMO
LIFESTYLE
MEDIA_AND_VIDEO
MEDICAL
MUSIC_AND_AUDIO
NEWS_AND_MAGAZINES
PERSONALIZATION
PHOTOGRAPHY
PRODUCTIVITY
RACING
SHOPPING
SOCIAL
SPORTS
SPORTS_GAMES
TOOLS
TRANSPORTATION
TRAVEL_AND_LOCAL
WEATHER
1 2 3 4 5
App rating
Figure 5.2: Boxplot of star-rating distribution across 30 app categories: the X axis represents the
star-rating distribution from one to ve stars, and the Y axis represents the app category
53
Then I did the rating impact analysis in each app category to determine how the rating
dierence impacted app rankings. To make the comparison of analysis results, I used the same
rating dierence (0.01), and chose the same rating points in the CFD distribution (e.g., 1st,
2nd, and 3rd quartile). These rating points outline the CFD distribution, and correspond to the
25th, 50th (i.e., median), and 75th-percentile ratings, respectively. At each of these three ratings,
I determined the percentage change (e.g., Y axis in Figure 5.1) when there was a 0.01 rating
dierence. This analysis was repeated across 30 individual app categories (X1 - X30), as well as
the whole dataset (X0). The results are shown in the last three columns of Table 5.2. The column
names 25%, 50% and 75% represent three rating points, and the values in each column represent the
percentage change that determines the ranking change. For example, for the app category BRAIN
(the row X3 in the table), the 3rd quartile rating (the column 75%) is 4.39, and the percentage
change in the CFD distribution is 1.75%. This means when there is a 0.01 increase in the app
rating, there would be a 6 change in rankings. By comparing X1 through X30 with X0, we can
see that the rating impact in each app category is more signicant than in the whole dataset.
This indicates the rating impact analysis in above RQs was actually under approximated.
Another threat to construct validity is that it is possible the problems in ad complaint reviews
did not exist in the corresponding apps. To investigate the potential impact of this threat on
my results, I randomly selected 30 apps that had complaints across the six RQs, and manually
interacted with them to conrm the complaint. I found that all the ad aspects complained about
by end users were found in the corresponding apps.
Conclusion Validity: There are two possible threats to the validity of my conclusions {
namely, the impact of confounding variables that I did not study in the chapter and the impact
of multicollinearity between the variables I did study in this chapter. I tried to minimize these
threats either by the design of the experiments or by evaluation of their impact via additional
analyses.
(1) The impact of confounding variables: the lower or higher ratings might be because of
other confounding variables in the study and not because of the ads. In my study I have two
confounding variables:
• Quality of the apps: A possible threat is that the quality of the apps has a higher impact
on the ratings of an app. In order to address this threat I designed my experiment to
only include top-ranked apps (i.e., those with a very high rating and very high number of
downloads) and, therefore, I can assume that the apps are all of high quality. This helps
control for the impact of app quality on app ratings. In my study, since all apps have high
ratings, no app can be pointed to as a bad app because of the features that it has.
54
• Category of the apps: It is possible that all apps in one category use one ad aspect and apps
in another category use another ad aspect, and all apps in one category have a considerably
higher rating than apps in another category. Hence, the rating dierence would be due
to app categories instead of ad aspects. To address this threat, for each RQ, I compared
the categories represented by the apps in each pair of S
1
and S
2
. I found the category
dierence between two pairs was generally in the range of 1{3 categories, while the number of
overlapping categories was typically 26. Therefore the only common thread that associated
with dierent ratings among all the apps in anS set was a particular ad aspect (i.e., not the
category), and any attribute of apps in a category is unlikely to be a confounding variable.
In addition to the above two confounding variables, it is possible that there could be other
confounding variables. In order to more broadly minimize this threat, I took the following two
steps:
• I examined a large dataset of apps (773) to ensure I had many data points in my study
and that the conclusions I arrived at were not based on a few unrepresentative cases. For
example, if I examined just one app for each ad metric, then my conclusions could be
impacted by the features of the app and not necessarily the ads. By considering a large set
of apps I ensure that the only common thread between all the hundreds of apps is a specic
ad metric.
• I performed rigorous statistical analysis to establish the signicance of my ndings. I did
not just compare medians. I used the MWU test to compare the ratings distribution and
evaluate the impact through a drop in rankings. Hence, any observations that I make have
the same statistical signicance as any other scenario where such a test was used.
(2) Multicollinearity between the studied ad aspects: it is possible that one or more of the six
RQs were not independent of each other. To identify collinearity among the RQs, I performed
a redundancy analysis of the results to check if a particular aspect under observation is a linear
combination of the other ad hypotheses (i.e., it was not independent). To implement this analysis,
I used the redun function provided by the Hmisc package [56] of the statistical tool R. The results
of this analysis showed that all the ad metrics, with one exception, were independent of each
other. The one exception was the metric dealing with the area of the ads (H
3
3). This was found
to be an almost perfect linear combination of the other two metrics, height and width. As a result
of this nding, I did not include the results of H
3
3 in my conclusions.
55
5.5 Conclusions
Mobile ads represent an important mechanism for allowing developers to monetize their devel-
opment eorts. However, mobile ads are generally unpopular and developers must use them
carefully to avoid negatively impacting an app's ratings and success. Unfortunately, best prac-
tices published by MANs and developer blogs are generally too generic to give concrete guidance
to developers. Furthermore, even when frequent topics of complaints are known, developers are
often unsure if the complaints are signicant enough or user sentiment strong enough on these
topics to impact their ratings. To address this limitation of current ad-related guidance, I in-
vestigated whether dierent types of ad usages associated with commonly complained about ad
aspects could lead to a dierence in app ratings.
Broadly speaking, the results of my investigation conrm that many types of ad usage patterns
that are frequent topics of ad-related complaints are likely to lead to dierences in ratings. More
strikingly, my results show that even seemingly minuscule dierences are not only statistically
signicant but impactful, as well. In several RQs, the dierence between S
1
and S
2
was less than
0.05 stars. However, even such a small value was signicant in terms of impact on the app's
potential success, as this dierence could lead to a signicant change in rankings. In turn, this
could cause an app to drop out of the always important rst page of results, potentially negatively
impacting an app's chance for success and a high number of downloads.
The results of my investigation also led me to propose several guidelines for developers. Overall,
I found that lower ratings were generally associated with apps that (1) used interstitial ads instead
of banner ads; (2) had ad frequency ratios close to 0.77; (3) used more than one ad per activity;
(4) made apps larger in height, closer to 366 pixels; (5) had ads appear in the middle or bottom of
the activity; (6) placed ads on the initial landing page of an app; or (7) had repeated ad content.
Many of these ndings are intuitive in hindsight. However, in my survey of developer blogs and
suggested ad practices produced by MANs, I found that many did not contain these guidelines,
gave con
icting or too general guidance with regard to the guidelines, or in cases where a guideline
matched mine, did not have any empirical evidence to support the recommendations. To highlight
this, I compare my ndings against the guidelines supplied by AdMob, one of the world's largest
mobile advertising networks with over 60% market share for all installs [11]. (1) AdMob promotes
the use of both banner and interstitial ads, which contradicts my ndings that interstitial ads are
associated with lower ratings than banner ads; (2) AdMob does restrict developers to, at most,
one ad per activity, which is consistent with my ndings but does not limit the overall frequency
of ads in apps, which I have found to be more associated with lower ratings; (3) AdMob suggests
that the height of ads should be no more than 250p; however, I have found that ad height close to
56
150p is associated with higher ratings; (4) AdMob advises developers to avoid the middle of the
app for placing ads, which is consistent with my ndings but does not caution against placing ads
in the bottom of the activity, which I found was also associated with lower ratings; (5) AdMob
cautions against placing interstitial ads on landing pages, which is consistent with my ndings
but does not warn against banner ads in the landing page, which I found had a similar association
with lower ratings; and (6) AdMob suggests that the ad refresh rate should be at least 60 seconds
to allow for user engagement but does not establish an upper bound, which leads to complaints
about repeated ad content, which is in turn associated with lower ratings. Taken together, these
contrasts show that the guidelines I have identied as a result of this work can further advance
developer practice by reducing the negative impact of ads on an app.
57
Chapter 6
Quantication and Analysis of Non-UI-related Ad Metrics
Mobile advertising has become an important part of many software developers' marketing and
advertising strategies [24]. This development has come about in just a matter of a few years. In
2010, the mobile advertising industry's revenue was just over half a billion dollars [106], but by
2013 it reached over 17 billion dollars [6], and in the rst quarter of 2014, had already reached over
11 billion dollars [87]. By 2017, analysts predict that revenue from mobile advertising will exceed
that of TV advertisements [32] and account for one in three dollars spent on advertising [23].
The presence of mobile ads has become pervasive in the app ecosystem with, on average,
over half of all apps containing ads [80]. This has been driven by the development of large-scale
advertising networks, such as Google Mobile Ads and Apple iAD, that facilitate the interaction
between developers and advertisers. To earn ad revenue, developers display ads in their apps by
making calls to APIs provided by an advertising network. When the ads are displayed on an end
user's device, the developer receives a small payment. A typical business model for a developer is
to place ads in their apps and then release the app for free with the hope that the ad revenue will
oset the cost of the app's development. In general, this model is perceived as a win-win situation
for both developers and end users: developers receive a steady, and sometimes large, ad-driven
revenue stream, and end users receive a \free" app.
A key problem in this model is that it depends on the perception that, aside from app devel-
opment, there are no additional costs to either the end user or software developer. While this is
true for direct costs, this fails to account for the indirect hidden costs of embedding mobile ads
in an app. On the end users' side, indirect hidden costs come in several forms: loading ads from
a remote server requires network usage, for which many users are billed by the amount of bytes;
loading and rendering ads requires CPU time and memory, which can slow down the performance
of an app; and nally, all of these activities require battery power, which is a limited resource on
a mobile device. The above ad aspects correspond to those non-UI topics in Chapter 4 that are
58
not observable by users from the user interface on the screen. Developers have hidden costs as
well. It is necessary to maintain the code that interacts with the advertisements, which requires
developer eort. The ratings and reviews a developer receives can also be aected. Studies have
shown that over 70% of users nd in-app ads \annoying" [22] and such users may give an app
a lower rating or write negative reviews. This negative response may then aect the number of
downloads of an app, which in turn can aect the developer's future ad revenue.
In this chapter I present the results of my investigation into the hidden costs (i.e., non-UI
ad aspects) of mobile advertising for both users and software developers. To carry out this
investigation, I performed an extensive empirical analysis of 21 real world apps from the Google
Play app store that make use of mobile advertising. My analysis considered ve types of hidden
costs: app performance, energy consumption, network usage, maintenance eort for ad-related
code, and app reviews. The results of my investigation show that there is, in fact, a high hidden
cost of ads in mobile apps. My results show that apps with ads consume, on average: 48% more
CPU time, 16% more energy, and 79% more network data. I also found that developers, on
average, make ad-related changes in 23% of their releases. The presence of mobile ads also has
a rating and review cost, as I found that complaints related to ads and these hidden costs were
relatively frequent and had a measurable impact on an app's rating. Overall, I believe that these
ndings are signicant and will help to inform software developers so they can better weigh the
tradeos of incorporating ads into their mobile apps, understand the impact ads have on their
end users, and improve end users' app experience.
6.1 Motivation
Ads occupy a unique position in the mobile app ecosystem. Strictly speaking, they are not
required for the correct functioning of an app. Yet they are essential for monetizing the app and
ensuring that developers can prot from their work. When considering their prot, developers
typically assume that the only cost they have associated with the app is development and normal
maintenance. Conversely, end users typically assume that their only cost comes in the form of
viewing ads and, perhaps, paying an upgrade fee to get an ad-free version. At some level, these are
reasonable assumptions, since these costs, or lack thereof, are clearly visible to both parties. As I
argue in Chapter 4, there are, in fact, other costs related to non-UI ad aspects. I refer to these as
hidden costs because for both parties they can go unnoticed, and even if recognized as costs, can
be dicult to quantify without additional infrastructure and analysis. In this chapter, I perform
a systematic investigation to quantify ve such hidden costs. Three of these directly aect end
59
users' mobile devices: network usage, energy consumption, and app runtime performance. They
cover most of non-UI ad topics that I categorized from user reivews in Chapter 4. Two of them
more directly aect developers: ad-related maintenance and user ratings. I chose to investigate
these specic hidden costs because they represent categories for which I have identied a process
for measuring their costs and also quantiable ways of showing their impact on both developers
and end users. Below I formally present the research questions (RQs) with respect to these hidden
costs and motivate their inclusion in my study.
RQ 1: What is the performance cost of ads?
Mobile apps display ads by using ad network provided APIs. As with all other invocations,
executing these methods requires the device to commit processing power (e.g., CPU) to carry
out the ad-related functionality. The consumption of this processing power by the ad libraries
represents processing power that could have been available to the app to improve its own per-
formance. Runtime performance is important to end users because it in
uences how \fast" or
\responsive" they perceive an app's implementation to be. Among non-UI ad topics I categorized
in Chapter 4, the ad performance could block an app's running (i.e., topic blocking), make an
app crash (i.e., topic crash), or slow down an app's running (i.e., topic slow). In this research
question, I focus on the runtime performance of an app and how this is aected by the additional
processing necessary to carry out ad-related functionality.
RQ 2: What is the energy cost of ads?
Mobile devices are energy constrained devices because they run on battery power, which is
limited. Therefore, energy eciency is an important concern for apps that run on mobile devices.
Components, such as display and network, are two of the most energy consuming components on
a mobile device [67, 64]. These two components also serve an important role in the mobile ad
ecosystem since they are used to retrieve and show ads. In this research question I quantify the
energy impact of ads in mobile apps (w.r.t. the non-UI ad topic battery in Chapter 4). This
energy cost is hidden to users because, although they are aware of battery limitations, they do not
have any way to isolate and evaluate the energy cost of the ad functionality which is embedded
in the mobile app. A high energy cost is impactful because running out of battery power renders
a device unusable or requires extra recharging cycles.
RQ 3: What is the network cost of ads?
Network access plays an essential role in the ad network infrastructure. Developers insert
invocations to ad network APIs that then send requests to ad servers for an ad to display. In
turn, the ad servers transmit advertising content back to the mobile apps to be displayed. All
of these require network usage by the mobile device, even if the app containing the ad does not
60
require network access itself. In many cases, network usage has a cost for end users who must
pay for Internet access or pay data charges for data access over a certain preset limit. This
could trigger user complaints such as those related to topics paid and automated in Chapter 4.
Although there is a direct cost associated with network usage, end users lack visibility into how
network is consumed. At best, they may use tools, such as Shark or Root [3], to monitor their
apps' network usage, but do not have any mechanism to distinguish how much of this usage is
related to ads. Therefore, this remains a hidden cost to them.
RQ 4: What is the rate of app updates related to ads?
Part of the development cost of an app is maintenance. This includes responding to bug
reports, adding new features, and evolving the app due to changes in the underlying OS and
platform. Prior work has shown that app developers frequently add, remove, or update ad-
related code in an app [80, 90]. This nding suggests that there may be a high maintenance cost
associated with the use of ad libraries. This motivates further investigation to determine how
much maintenance eort is caused by the use of ad libraries. In this research question, I examine
ad-related code in the apps and track its evolution over dierent app versions in order to isolate
the ad-related maintenance eort.
RQ 5: What is the impact of ads on an app's ratings?
The Google Play app store allows users to write reviews and provide a rating (between one and
ve stars) for the apps that they have downloaded. Good app ratings and reviews are essential
for the success of an app in the marketplace. Prior research has shown that app ratings are highly
correlated with app downloads [51]. Prior work has also shown that surveyed end users generally
have unfavorable perceptions of ads in mobile apps [62, 97, 103]. Therefore, it is possible that
these unfavorable reactions carry over and in
uence end users to give poor reviews for the app.
In this research question, I examine end user reviews and determine what the possible impact of
mobile ads is on the rating of an app.
6.2 Case Study Design
The goal of case study is to investigate the hidden cost of mobile advertisements to end users and
software developers. To carry out this investigation I designed a case study to capture and analyze
ad-related information and various other types of runtime metrics. In this section I explain how
I selected the apps for the study, the process for identifying and instrumenting ad behavior, the
creation of repeatable and automatically replayable workloads, and the monitoring and analysis
framework. I explain each of these aspects of the case study design in more detail below.
61
6.2.1 Selection of Subject Applications
For my case study, I had ve criteria for selecting the set of subject applications. These were:
(1) successful apps | indicating that the developers had found a balance of functionality and
ad usage; (2) representative of dierent categories of apps | to enable my results to generalize
to a broader pool of apps; (3) actively maintained with frequent releases | so I could examine
maintenance costs over time; (4) use of mobile ads; and (5) convertible to and from Java bytecode
using the dex2jar [2] tool | since I need to perform bytecode manipulation of the apps' classes
to facilitate the monitoring and analysis.
To obtain apps that met the rst two criteria, I took the top 400 apps in each of the 30
categories of Google Play as ranked by Distimo [5], an app analysis company that ranks apps
based on user ratings, number of downloads, and other measures of success. Not all categories
had 400 apps in the list of top apps. Therefore in the end I had a list of 10,750 apps from all
30 categories of Google Play. I crawled the Google Play app store everyday using a farm of
systems for eight months (from Jan 2014 to Aug 2014), to download every new release of the
app and its associated meta-data, such as average user-rating, number of users rating the app,
and user-reviews (500 at a time), among other things. To satisfy the third criteria, I sorted the
10,750 apps by the number of releases that each app had in the time frame of data collection
(from Jan 2014 to Aug 2014). Then, I selected the top 21 apps from this list, which represented
14 dierent categories of apps (e.g., travel, media, etc.). To satisfy the fourth criteria, I identied
the apps in the corpus that made use of the Google Mobile Ads (GMA) network. I identied an
app as making use of the GMA if it contained invocations of APIs provided by the GMA and had
visible ads displayed in some part of the user interface. I focused my investigation on only one
ad network to control for variability of costs between ad networks and chose GMA in particular
because it is the most popular and widely used, representing over 46% of the mobile ad business
for the rst quarter of 2014 [6]. Finally, I converted each app to Java bytecode, using dex2jar,
repackaged it using the Android SDK tools, and then manually veried that it executed without
failure to ensure it met the fth criteria. Descriptive information about each of these apps is
shown in Table 6.1. In this table I list the app's name, provide it with a unique ID that I use to
identify it in the evaluation graphs, its package name, physical size of the app's APK le, and the
category assigned to it by the Google Play app store. I also include the number of versions, the
number of reviews, and the average rating of each app for the time period between January 2014
and August 2014.
62
Table 6.1: Subject applications for the quantication and analysis of non-UI-related ad metrics
ID App Name Package Name Category Size (MB) # Versions # Reviews Avg. Rating
M1 Restaurant Finder com.akasoft.topplaces travel & local 3.7 24 464 4.35366
M2 Smileys for Chat (memes,emoji) com.androidsx.smileys communication 15.9 16 613 4.32011
M3 Arcus Weather com.arcusweather.darksky weather 2.8 30 513 4.32317
M4 Polaris Navigation GPS com.discipleskies.android.polarisnavigation travel & local 7.8 29 960 4.41557
M5 3D Sense Clock & Weather com.droid27.d3senseclockweather travel & local 10.7 20 399 4.42509
M6 Drudge Report com.iavian.dreport news & magazines 1.5 20 1317 4.24225
M7 Podcast Republic com.itunestoppodcastplayer.app news & magazines 3.6 39 1723 4.58928
M8 Followers For Instagram com.noapostroph3s.followers.instagram social 2.4 17 1337 3.75924
M9 Public Radio & Podcast com.nprpodcastplayer.app news & magazines 3.2 21 671 4.2379
M10 English for kids learning free com.oman.english4spanishkids education 8.0 21 90 4.13483
M11 Lomo Camera com.onemanwithcameralomo photography 29.5 20 942 4.33325
M12 Smart Booster - Free Cleaner com.rootuninstaller.rambooster tools 3.5 22 1258 4.50653
M13 Pixer com.sixtyphotos.app social 4.4 27 599 4.36689
M14 The Best Life Quotes com.socialping.lifequotes entertainment 2.6 31 784 4.42554
M15 SofaScore LiveScore com.sofascore.android sports 9.9 39 1158 4.72082
M16 Player dreams com.team48dreams.player music & audio 1.9 40 693 4.40827
M17 VLC Direct Streaming Pro Free com.vlcforandroid.vlcdirectprofree media & video 2.3 25 868 4.28025
M18 Translator Speak & Translate com.voicetranslator.SpeakAndTranslateFree travel & local 5.3 22 1024 4.38482
M19 7Zipper org.joa.zipperplus7 communication 7.6 19 699 4.66026
M20 Guess The Song quess.song.music.pop.quiz trivia 19.5 40 3426 4.62825
M21 Radaee PDF Reader radaee.pdf productivity 4.6 21 587 4.41044
6.2.2 Instrumentation of the Subject Applications
Addressing the research questions outlined in Section 6.3 requires that I have two versions of
each app, one with ads and the other without. To create the no-ads version of an app, I used
instrumentation-based techniques to remove all invocations of APIs dened by the ad network.
Note that some prior approaches have simply replaced the ad library with \dummy" implemen-
tations [79]; however, I chose to completely remove the invocations since there is a non-zero time
and energy cost associated with even an invocation of an empty method [48, 64]. To perform the
instrumentation, I rst converted each app's APK into the corresponding Java bytecode using
dex2jar. Then I used the ASM library [1] to analyze the bytecode of each class of each app and
identify ad-related invocations. These invocations could be identied by matching the package
name (e.g., \ads," \mobileads," and \mobads") of the invocation's target method with that of
known ad networks. The package names of ad networks can be found by examining their API
documentation. For each ad-related API invocation identied, I wrote instrumentation code to
remove the invocation and, where possible, any other support objects created as arguments to
the invocation. In some cases, it was not possible to remove all references to the ad library.
Namely, if an invocation unrelated to ads had an ad-related argument, then I could not remove
the initialization of that argument. In subject apps there were 141 such problematic invocations,
out of a total of 716 ad-related invocations. After instrumentation, I repackaged the app and then
veried the removal of the ads with two checks. First, I manually executed the apps and veried
that there were no visible ads. Second, I used tcpdump on the smartphone's local network to
see if there were any signs of an ad API accessing the network. To create the version of the app
with ads, I decompiled then repackaged each app without removing the ads. I did this to control
63
for any bytecode level transformations introduced by dex2jar, asm, or dx, which would have also
occurred in the no-ads version.
6.2.3 Generation of Subjects' Workloads
For each app, I created workloads to execute the app and exercise its functionality. The goals
for each workload I created were: (1) as complete as possible with respect to the app's primary
functions; (2) repeatable across multiple executions of the app; and (3) long enough to ensure
several ad reload cycles.
To generate workloads, I leveraged the RERAN tool [35]. This tool records a user's interaction
with an app as a series of events and can then replay these events at a later time. To generate an
initial workload, I interacted with each app and tried to invoke as much functionality as possible.
For example, I clicked the dierent buttons or labels on a screen, stayed for some time on a
new page, returned or went to another new page, and entered values for text and search boxes.
Although these workloads may not be representative of realistic usage, they provide me with a
more or less complete coverage of the app's key functions. On average, I interacted with each app
for 1.5 { 4 minutes. This amount of time was chosen because GMA can be set to refresh every 30 {
120 seconds, and with this interaction length I would ensure several ad reloads. After creating an
initial workload, I repeated the execution of the workload several times and manually veried that
the execution of the app was deterministic with respect to the sequence of actions and identied
any system state (e.g., resetting system settings and killing service related processes) that needed
to be restored prior to the replay of the interaction. In many cases, the execution of the no-ad
version would require a systematic shift of the X and Y coordinates of certain user events (e.g., a
touch), due to the absence of a displayed ad, and I corrected the RERAN traces at this time.
6.2.4 Monitoring and Analysis of Subject Applications
To collect runtime data on the hidden costs, I ran both versions of each app (with ads and
no-ads) while monitoring its execution. The mobile device I used was the Samsung Galaxy SII
smartphone with a rooted Android 4.3 operating system. For each version, I rst restored the
system environment to its original state. Then I loaded the app on the mobile device and started
its execution. Before beginning the replay, I allowed the system to sleep for 15 seconds to ensure
that the initial page had completely loaded and displayed. Then I began the RERAN replay.
During the replay execution of the app, I recorded statistics about the execution. This process
was repeated four times for each experiment to minimize the impact of background noise, and in
64
each iteration, the order of the apps and both versions was changed. The specic statistics and
measurements taken during the execution varied according to the addressed research question. I
elaborate on the measurements and metrics for each research question in Section 6.3.
6.3 Results and Discussion
In this section I discuss the details of the experiments I carried out to address each of the RQs
dened in Section 6.1. For each RQ, I describe the approach I employed to capture the relevant
metrics and measurements, present the results I obtained, and discuss the implications of these
results with respect to each of the RQs. Essentially, each of the subsections in this section describes
the monitoring and analysis portion of my case study (Section 6.2.4) as it was customized to
address the RQs.
6.3.1 RQ 1: What is the performance cost of ads?
Approach: To determine the performance cost of mobile ads, I measured two performance
metrics, CPU utilization and memory usage, on both the with-ads and no-ads version of each
app. To obtain these metrics, I ran the standard top utility on the mobile device while it was
executing both app versions. I set the top tool to record the two performance metrics on a
one-second interval. Specically, top recorded the CPU percent utilization and the amount of
memory in the Resident Set Size (RSS), which indicates how many physical pages are associated
with the process. Since the running app was the only one that had a process foreground visible
during the replay, the RSS re
ected the physical pages of the app's process. I then calculated
the average value of the metric for each app. Note that even though running top can aect the
mobile device's performance, I veried through experiments that the eect of top was consistent
across app executions and versions.
Results: The results of these experiments are shown in Figure 6.1. The dark red bar shows
memory usage, and the light blue bar shows CPU utilization. Each bar in this graph represents
the percent dierence between the performance metrics for the with-ads and no-ads versions listed
along the X axis. A positive number means that the with-ads version had a higher value for the
metric. As the results show, the with-ads version had a higher performance cost for all of the
subject apps. The median memory increase was 22%, and the median CPU utilization increase
was 56%.
Discussion: Overall, the results show a markedly higher resource consumption for apps with
mobile ads. I expect that this result is due, in part, to managing the increased network usage
65
M1 M3 M5 M7 M9 M11 M13 M15 M17 M19 M21
CPU cost
Memory cost
Apps
Average Performance Cost of Ads (%)
0
20
40
60
80
Figure 6.1: Relative performance cost in terms of memory usage (dark red bar) and CPU utiliza-
tion (light blue bar): each bar represents the percent dierence between the performance metrics
for the with-ads and no-ads versions. A higher number means that the with-ads version had a
higher value for the metric
66
that I nd is associated with ads in Section 6.3.3. I also expect that retrieving and updating ads
occur when an app might otherwise be in an idle state waiting for a user event. Mobile apps
actually spend a signicant amount of their perceived runtime in an idle state [64]. Therefore,
even the addition of a small amount of activity, such as managing ad interactions, can lead to a
surprisingly large increase in CPU utilization. Case in point, in my experiment the median with-
ads and no-ads actual CPU utilization was 20% and 7%, respectively. I hypothesized that the
increase in CPU utilization was also likely to indicate that end users would experience a slowdown
in response time. To evaluate this, I instrumented the Android event handlers and activities for a
subset of the subject apps using an instrumentation tool I had developed for Android apps in prior
work [65, 48, 49] that is based on the ecient path proling technique by Ball and Larus [13].
I was unable to instrument all of the apps because the tool used BCEL, which was limited in
its ability to process some of the apps' bytecodes. Nonetheless, using this tool I was able to
instrument and measure the execution time of eight with-ads and no-ads versions. I found that,
on average, the with-ads versions took 7% longer to complete their event handling and activities.
This suggests that including mobile ads has a measurable impact on the level of responsiveness
of the app.
6.3.2 RQ 2: What is the energy cost of ads?
Approach: To determine the amount of energy consumed by mobile ads, I modeled the energy
consumption of each app's with-ads and no-ads version during the workload replay. The goal
was to predict the energy consumption of mobile ads. My prior work [43] showed the ad energy
could be accurately modeled and estimated. In the work, the energy consumption of ads shown
to be comprised of three parts: the system, network and display. Before beginning the replay, I
started the measurements to record the information of system (CPU time and memory usage),
network (total bytes transmitted), and display (screenshots) components. Then I noted each
screenshot's display time and the time at which the replay began and ended. After collecting
above information, I applied the model to estimate the energy consumption of each component.
The total energy consumption of an app was calculated as the sum of all energy consumed between
the beginning and ending timestamp. The dierence between the energy consumption of the with-
ads and no-ads version represented the energy cost associated with running the mobile ads. Unlike
the performance metrics measured in Section 6.3.1, I included the energy incurred while the app
was idle. The reason for this is that display represents a signicant portion of an app's energy
consumption and visible ads are directly responsible for a portion of that display energy.
67
M1 M3 M5 M7 M9 M11 M13 M15 M17 M19 M21
Apps
Average Energy Cost of Ads (%)
0
5
10
15
20
25
30
35
Figure 6.2: Relative energy cost: each bar represents the percent dierence between the energy
metric for the with-ads and no-ads versions. A higher number means that the with-ads version
consumed more energy
68
Results: The results of the energy comparison are shown in Figure 6.2. For each app along the
X axis, the chart shows the relative energy consumption increase of the with-ads version over the
no-ads version. For all apps, there was always an increase in energy consumption associated with
the with-ads version. The energy increase ranged from 3% to 33%, with a median of 15%.
Discussion: For some of the apps, the energy consumption related to ads is quite high. I found
that six apps had an increase of over 20% in their energy consumption due to their use of mobile
ads. With energy, it is important to note that a high cost does not directly translate into a high
nancial cost. For example, even a 33% energy increase only represents 50 Joules. The cost to
recharge a battery with this amount of energy is negligible. However, a high energy cost can
translate into a usability cost for a mobile device. Consider the following illustrative scenario. A
typical battery for the Samsung Galaxy SII smartphone contains 2.5 hours of charge. If the SII
were to run only the with-ads version of the app with the median energy consumption, then the
charge would last 2.1 hours instead of 2.5 hours. With the most expensive ad energy cost, this
number decreases to 1.7 hours. This means that an end user would have to recharge their phone
33% more often to compensate for the most expensive ad's energy cost. Overall, this decreased
battery lifespan could impact the usability of a mobile device as users would have to charge it
more often and have a shorter amount of time in which they could use their phones.
6.3.3 RQ 3: What is the network cost of ads?
Approach: To address this research question, I collected measurements of an app's network usage
during the replay of the app's workload. The rst of these measurements, data usage, is the total
number of bytes sent and received by the app during the replay, and the second measurement,
the number of packets, is the count of network packets sent and received by the app during the
replay. To obtain these measurements, I ran tcpdump on the smartphone's network connection
to record every packet sent and received on the smartphone during the workload replay. I then
analyzed the captured network trace to compute the measurements. This process was repeated
for the with-ads and no-ads version of each app. I then calculated the relative dierence of both
metrics for the two versions of each app.
Results: Figure 6.3 shows the results of this experiment. The Y axis shows the relative dierence
for each of the apps listed along the X axis. The light blue bar represents the percent dierence in
data usage and the dark red bar represents the percent dierence in the number of network packets.
A positive number indicates that the with-ads version had a higher value for the measurement.
The results show that the with-ads version always had a higher data usage and packet count
69
M1 M3 M5 M7 M9 M11 M14 M17 M20
Total bytes
Number of packets
Apps
Average Network Cost of Ads (%)
0
20
40
60
80
100
120
Figure 6.3: Relative network cost in terms of data usage (light blue bar) and number of network
packets (dark red bar): each bar represents the percent dierence between the network metrics
for the with-ads and no-ads versions. A higher number means that the with-ads version had a
higher value for the metric
70
than the no-ads version. For the subject apps, the median increase in data usage over the no-ads
version was 97% and for packet usage it was 90%. The results also show that the dierences for
data usage and package count were generally within a few percentage points of each other for each
of the apps.
Discussion: Overall, the results show that there is a very high network cost associated with
mobile ads. Moreover, there were several cases in which the percent increase was 100%, which
indicated that almost all of the network trac for the app was due to ads. There were also four
apps with relatively low network cost increases. These four were the heaviest network users. For
example M7 and M20 played songs during their replay, so the increase due to ads was smaller
relative to the overall network usage of the app. I also analyzed the data in more detail to better
understand the potential impact of the ad-related network trac. I looked at the impact in terms
of potential cost in dollars and energy ineciencies. For dollar cost, I calculated the median
absolute increase in network usage, which was 243,671 bytes and multiplied this by the average
cost per MB of a major US-based carrier (AT&T), which was $0.07 in 2013 [4]. From this I
determined that each execution of the with-ads version could potentially cost end users $0.017
more in terms of network charges. Although this type of cost would only apply in situations
where users were paying for metered data, I note that, in the case of data overage charges,
this would be the amount that could be directly attributed to the ads. With regards to energy
ineciency, mobile devices are energy-inecient when they send packets that are smaller than
the maximum packet size. This is because there is a xed overhead for sending packet headers,
which is amortized over larger packets sizes [63]. So I looked at the average percentage of packets
involved in communication with the ad networks that were smaller than the maximum size packet
and, therefore, were suboptimal with respect to energy usage. For this analysis, I focused on
the 16 apps that had almost all (over 80%) of their network trac due to advertisements, so I
could more accurately characterize only ad-related trac. I also excluded all TCP control-related
packets (e.g., SYN and ACK packets). I calculated this number for both the with-ads and no-ads
version, and then subtracted the no-ads count from the with-ads count to isolate the number of
suboptimal sized packets that were due just to ad trac. The results of this analysis indicate
that for these apps, over 10% of the advertising trac was within this size range and, therefore,
making sub-optimal use, energy-wise, of the network resources. Taken together, the results of
these additional analyses show that the increased network usage of embedded ads can also result
in real dollar costs for end users and often represents an energy-inecient use of network resources.
71
M1 M3 M5 M7 M9 M11 M13 M15 M17 M19 M21
Apps
Percentage of Releases (%)
0
10
20
30
40
50
Figure 6.4: Ad-related maintenance cost: each bar represents the ratio of the number of app
versions that had ad-related changes to the number of app versions that have been released. A
higher number means that the app had a higher ratio of app versions with ad-related changes
72
6.3.4 RQ 4: What is the rate of app updates related to ads?
Approach: Our goal in this research question is to determine the cost, in terms of maintenance
eort for the developer, of including ads in the apps. Since I do not have the development history or
the source code repository of the subject apps used in my case study, I approximate maintenance
eort as the number of releases of the app in which a developer performed ad library related
changes in their apps. Note that this metric does not imply that a release was only related to an
ad-related change, only that the release includes ad-related changes. For every release of each app,
I decompiled the release and extracted the calls to the GMA network. For each app, I determined
if the ad-related calls were dierent from one release (Release
i
) to the next (Release
i+1
). I dened
dierent by treating the collection of ad-related calls as a multiset of tuples, where each tuple was
comprised of the containing method of the call and the target method of the call. If there was a
dierence in the two multisets, then I concluded that there was an ad-related change occurring
between those two releases. Note that our denition does not include simple changes, such as a
call changing in location within its original method. I performed this analysis for all releases of
all apps to determine how many versions of the app had ad-related changes.
Results: In Figure 6.4, I report the ratio of the number of app versions that had ad-related
changes to the number of app versions that have been released. Overall, the median value for this
metric was 22%. This indicates that half of the subject apps had ad-related changes in almost
one of every four releases. I also found that this metric had a wide range. For example, the app
com.akasoft.topplaces and com.voicetranslator.SpeakAndTranslateFree had an ad-related change
in every other release. While at the other end of the spectrum, radaee.pdf did not have any
ad-related changes in its 21 releases.
Discussion: My results found that a considerable portion of releases had ad-related changes.
This was counter-intuitive as the ad network libraries are generally straightforward to use and
stable. So I investigated my results further to try to understand the reason I saw such high
numbers. First, I compared the number of ad-related changes against the number of updates that
had been performed to the GMA network libraries in the same time period. I determined the
update number to be ve by looking at the release history of the GMA library. Of the subject
apps, there were 11 that had either the same number or fewer number of ad-related changes,
which could oer a possible explanation for their changes. However, there were still 10 apps
that had a higher number of ad-related changes. By investigating the reviews, I found another
possible explanation, that users were reacting negatively to ad-related changes in the apps and
developers were responding to these complaints by modifying ad-related behavior. For example,
73
one of the users of the app com.discipleskies.android.polarisnavigation, wrote a one-star review
for the app, stating: `Last update full of annoying ads. Don't update.' Polaris was also one of the
apps above both the median and average number of ad-related changes. Similarly, one user of the
app com.noapostroph3s.followers.instagram complained (in a one star review) that `The update
didn't x any bugs it only added ads!!!' This app also had a higher-than-average percentage of its
releases involve ad-related changes. I also note ndings by Khalid and colleagues [59], reporting
that 11% of all user complaints in their subject apps occurred immediately after updates. I
hypothesize that app developers may be changing their apps' ad-related behavior to possibly
increase ad revenue, and then adjusting the ad behavior in response to user complaints. In future
work, I plan to investigate this hypothesis using more sophisticated static analysis based code
change techniques. Regardless of the reason behind the ad-related changes, it is important to
note that maintenance is an expensive part of the software lifecycle and code that results in
higher-than-expected maintenance eorts can represent a hidden cost to the developer.
6.3.5 RQ 5: What is the impact of ads on an app's ratings?
Approach: To address this research question I investigated the impact of ads and hidden costs
on the reviews of the app. To gather the review and rating information, I crawled the Google Play
app store and collected the reviews for each of the subject apps on each day between January
2014 and August 2014. Since Google Play only allowed me to retrieve 500 user reviews per day,
I retrieved up to 500 reviews on the rst day and then, on each subsequent day, retrieved all of
the latest reviews (up to 500). Thus, if an app got fewer than 500 reviews, I was able to retrieve
all the reviews, but if there were more than 500 reviews, then I only got 500 of the most recent.
In total, I collected 20,125 reviews for the subject apps. Of these reviews, I only considered the
one and two star reviews, since they have been shown, via sentiment analysis, to re
ect user
complaints [58]. This gave me 2,964 reviews. I then analyzed the reviews to determine if any
of them had keywords related to ads (regex = ad/advert*) or any of the hidden costs dened in
RQ1{3 (regex = power/drain/recharg*/battery/batery/network/bandwidth/slow/hang). I chose
these particular keyword variations based on the prior experience of my collaborator in manually
examining user reviews for dierent types of user complaints [59].
Results: In Figure 6.5 I present the percentage of one and two-star reviews where users complain
about ads or one of the hidden costs. Only two apps (com.androidsx.smileys and com.socialping.lifequotes)
had no ad-related complaints and all apps had complaints related to at least one of the hidden
costs. Overall, over 50% of the apps had at least 3.28% of their user complaints dealing with ads
and 5.04% dealing with hidden costs dened in RQ1{3. These numbers should be considered a
74
M1 M3 M5 M7 M9 M11 M13 M15 M17 M19 M21
Reviews with cost complaints (RQ1−3)
Reviews with ad complaints
Apps
Percentage of Complaints (%)
0
5
10
15
20
Figure 6.5: Percentage of complaints: each bar represents the percentage of one and two-star
reviews with user complaints about ads (dark red bar) or one of the hidden costs (light blue bar).
A higher number means a higher ratio of complaints for the metric
75
lower bound since I only considered complaints that explicitly mentioned one of the keywords and
it is possible I did not consider all possible ways to complain about a particular topic.
Discussion: In an absolute sense, the percentage of complaints about either ads or one of the
hidden costs may appear small. However, ndings by Khalid and colleagues [59] put these numbers
into context. In their work they found 12 categories of user complaints by manually analyzing the
reviews from 20 iOS apps. They also found that seven of the 12 complaints had an occurrence
frequency less than 3.28%. Therefore, I consider the complaint occurrences of ads and hidden
costs to be higher than average. One might wonder if the costs are indeed hidden since they have
a higher-than-average number of complaints about these topics, but it is important to note that
these reviews together comprise only a little more than one percent of all of the reviews and, as
such, are not likely to register with the developer. Nonetheless, they do have a measurable, albeit
small, impact on the ratings. I recalculated each app's new rating if the reviews complaining
about either ads or one of the hidden costs were to be removed. The average increase in rating
would be about 0.003 stars. Here again, this is a small number, but it should be noted that a
0.003 change is sucient to change even the ranking of several of my subject apps if they were
ranked by rating.
I also performed a manual investigation of the complaints in order better to understand the
nature of the complaints. I found that an overwhelming number of the reviews were about the
interference of the ads with the UI of the app (53% of all the reviews related to ads). Specically,
users complained that
• there were too many ads (e.g., for app com.akasoft.topplaces, where a user says - `Too much
adverts'),
• the ads were too big or took up too much of the screen space (e.g., for app com.akasoft.toppla-
ces, where a user says - `Annoying full screen ads every time you run the app. Uninstall')
• the ads blocked essential functionality (e.g., for app com.onemanwithcameralomo, where a
user says - `Ads right in the middle of my photos uninstalling')
Therefore, I conclude that, in specic, ad placement within the UI can cause many negative
reviews and should be a developer concern when adding ads to their apps.
The next most frequent complaint from the users about ads is having ads in the app even after
getting a paid-for version of the app (28% of all reviews related to ads). More specically, users
complained when
• they paid for an app (e.g., for app com.noapostroph3s.followers.instagram)
76
• they downloaded a paid version for free as part of a promotion (e.g., for app com.onemanwith-
cameralomo)
• they referred the app to other users (e.g., for com.sofascore.android)
Therefore, I conclude that the presence of ads in paid-for versions of an app is a trigger for com-
plaints. Developers should carefully weigh the benets of extra ad revenue versus the possibility
of upsetting paying customers.
I also noticed several interesting trends for dierent apps. The users of the apps com.disciple-
skies.android.polarisnavigation and com.sixtyphotos.app thought that the ads were very intrusive.
In this case, the apps displayed ads even after the app was closed. Finally, one user in one
app (radaee.pdf) directly complained about the power consumed by the ads. In my results in
Section 6.3.2, I found that this app had an approximate energy cost of 20%. There were ve apps
with higher energy costs that did not receive any such complaints, suggesting that such costs,
although high, are indeed hidden to the end user.
6.4 Generalizability
As described in Section 6.2.1, I chose only apps that used the GMA. While this helped to
control for ad network variance, it raises a possible threat to the external validity (or general-
izability) of the results. In a small study, I evaluated whether the results I described in Sec-
tion 6.3 held for other ad networks as well. Therefore, I chose two other popular ad networks
| Amazon Mobile Ads (AMA) and Mopub Mobile Ads (MMA).
1
I then chose two apps for
each mobile ad network with the same criterion as my subject apps (i.e., successful apps that
are actively maintained). These apps are: G1: com.bambuna.podcastaddict (AMA + GMA),
G2: com.x2line.android.babyadopter.lite (AMA + GMA), G3: com.vemobile.thescore (MMA +
GMA) and G4: com.slacker.radio (MMA + GMA). Each of these four apps had their respective
new ad networks and the GMA (I could not nd apps that had only other ad networks and not
GMA). I then repeated my experiments for all ve research questions on these four apps.
Table 6.2 shows the results for these four apps. To place them in context I show the minimum,
median and maximum values for the corresponding results of the GMA apps. I nd that in some
cases the results for the four new apps are similar to the median value, such as the memory cost
in RQ1. For other questions, such as the network costs in RQ3, the results for G1 (14% and 12%)
and G4 (28% and 29%) are far from the median values of 97% and 90%. However, for all the
1
Mopub can be considered as an ad mediation service [104], but serves as an ad network for the purpose of my
study.
77
Table 6.2: Comparison of non-UI-related ad costs among 21 apps with only one ad network, two
pairs of apps with two ad networks, and 773 apps that were used in the analysis of UI-related ad
aspects
(21 apps with GMA) (AMA+GMA) (MMA+GMA) (Large scale on 773 apps)
Metric (RQ) Min Median Max G1 G2 G3 G4 Min Median Max
CPU (1) 6 56 84 43 43 27 26 2 44 91
Memory (1) 3 22 37 15 25 14 21 1 21 65
Energy (2) 3 15 33 20 15 20 17 1 20 45
Bytes (3) 4 97 100 14 70 62 28 1 76 100
Packets (3) 5 90 100 12 78 61 29 2 72 100
Updates (4) 0 22 50 18 17 24 31 0 22 73
Complaints (5) 0 3.28 11 1.85 0 2.47 7.61 0 5 71
questions, the results for the four new apps are between the minimum and maximum values. This
shows that the apps with other ad networks do not have costs signicantly better or worse than
apps that have only GMA. Therefore, I can be more condent that my results and conclusions
may be applicable to other ad networks, as well.
To address the possible threat related to the representativeness of 21 subject apps used in
Section 6.2.1, I investigated the same large dataset (i.e., 773 apps) as described in Section 5.2.1 that
focused on UI-related ad aspects. Five apps (com.akasoft.topplaces, com.discipleskies.android.-
polarisnavigation, com.nprpodcastplayer.app, com.socialping.lifequotes, and com.vlcforandroid.-
vlcdirectprofree) in the large dataset were also in the smaller dataset, but the versions were
dierent. To do this, I applied the same approach as in Section 6.3 to measure corresponding ad
costs. Instead of generating workload manually for RQ1-3, I automated the process using PUMA
to interact with apps. In the end, 369 apps were successfully executed with ad-related metrics
measured. Those that failed the execution were due to unsuccessful instrumentation or expired
ads (e.g., blank ad area). Nevertheless, the number of successful apps is sucient to validate the
representativeness, since this size represents a 95% condence level with a 6% condence interval
of all 10,750 apps. In Table 6.2, the last three columns show the results of the large-scale study.
They are the minimum, median, and maximum values for dierent questions. Note that for RQ1-
3, the number of subject apps in the experiment is 369. By comparing these last three columns
with the results for the 21 subject apps (the rst three columns), we can see that the results
are comparable except the network usage (i.e., bytes and packets). However, this network cost
does not change my conclusions in above analysis. Instead, for all the questions, we are seeing
the signicant cost associated with mobile ads for both sets of results. Therefore, I can be more
condent that my results and conclusions may be representative of other top-ranked apps as well.
78
6.5 Threats to Validity
In this section, I discuss the threats to validity of my study and explain the steps I took to mitigate
those threats.
External Validity: The results and conclusions of my case study were based on a set of 21
Android apps. Since this set size is signicantly smaller than the population of all Android apps,
this could impact the generalizability of the conclusions. To mitigate this threat, I ensured that
all apps were real-world apps downloaded from the ocial app store for Android apps, i.e., the
Google Play app store. Additionally, these apps represent popular and successful apps, have a high
ranking via the Distimo listing, a high number of versions, and represent 14 dierent categories
of apps.
To address the threat related to the representativeness of these 21 apps, I conducted a large-
scale study on 773 apps using PUMA. This is an automated process. The traces generated by
PUMA may not be realistic, which is an intrinsic threat of the automated approach. However,
it is challenging to mimic and automate real users' interaction. Automated approaches, such
as using PUMA or monkey, were widely applied to large-scale study in the literature [15, 50,
64, 70]. Note that in the end, many apps were not successfully evaluated. The reason is that
the measurement of non-UI ad aspects requires app instrumentation to remove ads. Due to the
limit of the ASM library, some apps failed the instrumentation. For the original 773 apps, the
UI-related experiments were conducted three years before this evaluation. This leads to the fact
that many apps' ads were expired. It can be due to either the app-level conguration (not able to
enter before upgrading the app), or the ad library settings (requiring ads to be loaded rst before
accessing app content, but some ad servers reject the ad request with null value returned if there
is a lower version of ad library in the app). However, the goal is to evaluate whether 21 subject
apps are representative of top-ranked apps, and 369 successful apps, as argued in Section 6.4, are
sucient to validate such representativeness.
Nonetheless, I acknowledge several limitations to the generalizability of my results. First, my
work targeted only Android apps, so the results may not generalize to iOS-based apps, which
also represent a signicant portion of the app marketplace. However, I expect that since the
underlying mechanisms of ad display are similar, we would see similar results. Second, I biased
my app selection for popular and successful apps. It is possible that less successful apps may have
a dierent hidden cost. I hypothesize that since these apps are less successful, it is likely that
they have higher hidden costs, so my results would represent a lower bound on the average hidden
costs. However, exploring this hypothesis is something that can be done in future work.
79
Internal Validity: In my study, I used dierent tools or commands to measure each metric
on the smartphone. I chose standard tools that have been used in previous research studies. To
ensure the reliability of my results, I repeated measurements four times. Below I present details
about the tools and the steps I took to mitigate specic threats to internal validity.
First, I instrumented Java bytecode which was generated from Dalvik bytecode by reverse
engineering. The dex2jar, asm, and dx tools may introduce dierences on the instrumented
version from the original app. To address this threat, I ran the tools on both versions (with-ads
and no-ads) so that any eect would be consistent.
Second, I used RERAN to record and replay workloads in my experiments. However, it
is dicult to execute two versions (i.e., with-ads and no-ads) of an app in exactly the same
environment. To mitigate the eects of the environment on the nal results, I adopted dierent
strategies as described in Section 6.2. I measured the cost of mobile ads through several groups
of runs for each app. The average time dierence between two consecutive runs under the same
workload was 0.32%, with the highest value being 0.78%, which is negligible compared to the
total duration of each run. This indicates that, for the most part, I was able to maintain similar
execution environments for the apps' replay.
The energy consumption of mobile ads was modeled and estimated. This was done by pre-
dicting the energy consumption of with-ads and no-ads version of an app. To measure the model
accuracy, in my prior work [43] I used Error Estimation Rate (EER), which is dened as the
percentage dierence between the estimated values and the ground truth. In the experiments, I
sampled a total of 77 execution traces of mobile ads. I then applied linear regression analysis with
10-fold cross validation. The correlation coecient for the output was 0.964, which shows a strong
linear relationship in the model. After applying the generated model to estimate the energy cost,
the overall average error rate was 14%. This value is roughly comparable to the accuracy results
(e.g., 10% in [48, 65]) obtained by other energy estimation techniques. The results indicate that
I can obtain accurate energy information about ads.
Construct Validity: The goal of my study is to measure the hidden cost of mobile ads.
I investigated this by dening and measuring several metrics for end users and developers. In
Section 6.2, I explained why I chose these metrics. However, the relative importance of these
metrics may vary in dierent usage and development contexts, and it is likely that there are other
cost metrics, not addressed in my work, that may be important in other settings. Thus, the cost
of ads presented in this chapter is a lower bound. I hope that this work will encourage further
identication and quantication of additional hidden costs.
80
Content Validity: My experiments and conclusions assume that developers would want to
minimize costs to end users. An intriguing question and threat to the validity of my conclusions
is that developers may disregard these costs in order to drive end users to paid ad-free versions
of their apps. Related work has shown that paid versions of apps have signicantly lower costs in
terms of network usage [110]. However, it is not clear if this is intentional and developers must
still strike a balance between ad placement and the quality of the user experience to avoid driving
away potential paying customers.
6.6 Conclusion
Millions of smartphone users install and use thousands of free apps, which are often monetized
by the developers via mobile advertisements. Based on the non-UI-related ad topics I categorized
from user reviews in Chapter 4, I postulate in this chapter that the apps are merely free to
download and in fact have several forms of hidden costs due to the ads. I carry out experiments
on 21 highly-rated and actively maintained Android apps to investigate these costs. I nd that the
cost of ads in terms of performance, energy, and bandwidth are substantial. The median relative
CPU, memory, energy, and bandwidth costs of ads are 56%, 22%, 15%, and 97% respectively. I
also nd that 50% of the apps have ad-related changes in one out of almost every four releases,
and that 4% of all complaints in the reviews of the apps in the Google Play app store are with
respect to ads. Although this is an intuitive observation, such results have not been formally
demonstrated and are hard to quantify without extensive measuring infrastructure. I believe that
my study provides strong evidence of hidden costs due to ads in apps, and that developers need
to optimally use them in the apps. The take-home message of my study is that both the research
community and the ad library networks need to take these costs into consideration and to make
ads more cost ecient.
81
Chapter 7
Related Work
In this section I present related work that focuses on dierent aspects of mobile advertisements.
7.1 Study of Ad Costs
Researchers have studied one or more cost indicators and examined how to improve mobile ads.
Wei and colleagues [110] quantied the network usage and system calls related to mobile ads
based on carefully constructed rules (e.g., designing and implementing a multi-layer system for
monitoring and proling Android apps) and quantied the dierences between free and paid
versions. In their evaluation, they observed that many apps communicated with an unexpectedly
high number of sources (as many as 13) in an unencrypted manner, and a signicant increase in ad
trac could make free apps end up costing more than their paid counterparts. Vallina-Rodriguez
and colleagues [104] analyzed a large amount of ad trac data from a major European mobile
carrier and categorized mobile ad trac according to dierent dimensions, such as overall trac,
frequency, etc. In addition, they used a custom-built app with an ad slot at the bottom of the
screen for the evaluation of the energy consumption of three popular ad networks. Nonetheless,
only pure ad trac was evaluated with respect to the energy cost, and the impact of background
trac was unknown. Their ndings conrmed mobile ad trac was signicant and responsible for
important energy and network overhead. They also demonstrated that the mobile ad ecosystem
was mainly dominated by Google services (e.g., AdMob). Pathak and colleagues [85] implemented
Eprof to get the energy consumption of mobile ads in free apps. By analyzing the internal energy
dissipation of these apps, they found that a large portion of energy in free apps was spent in
third-party advertisement modules.
82
7.2 Study of Ad Libraries
Previous studies of mobile advertising have also focused on the ad library, which is used as a
separate package embedded into the application. Mojica Ruiz and colleagues [80] examined the
impact of ad libraries on the ratings of Android mobile apps, and found that integrating certain
ad libraries could negatively impact an app's rating. They also carried out a broad empirical
study [90] on ad library updates in Android apps. The results showed that ad library updates
were frequent, and suggested substantial additional eort for developers to maintain ad libraries.
Grace and colleagues [41] developed a system to systematically identify potential risks exposed by
mobile in-app ad libraries. Li [68] investigated the use of common libraries in Android apps, and
collected from these apps 240 libraries for advertisement. Liu and colleagues [72] explored ecient
methods to de-escalate privileges for ad libraries in mobile apps. The system they developed
contained a novel machine classier for detecting ad libraries. Book and colleagues [18] observed
a steady increase in the number of permissions that Android ad libraries were able to use, after
examining a sample of 114,000 apps.
7.3 Study of Ad Surveys
Another group of related work conducted surveys and proposed dierent methods or models to
identify factors that in
uence consumers' responses to mobile ads. Leppaniemi and colleagues [62]
investigated factors such as the marketing role of mobile medium, development of technology,
one-to-one marketing medium, and regulatory issues, which in
uenced the acceptance of mo-
bile advertising from both industrial and consumer points of view. With this information they
built a conceptual model of consumers' willingness to accept mobile advertising. Blanco and col-
leagues [16] suggested entertainment and information as precursory factors of successful mobile
advertising messages, after an empirical study using structural modeling techniques. Henley and
colleagues [47] conducted a study to investigate college student exposure to and acceptance of
mobile advertising in the context of an integrated marketing communication strategy and found
that incentives were a key motivating factor for advertising acceptance. Xu and colleagues [112]
launched surveys to identify factors that in
uenced consumers' response to mobile ads. Dier-
ent methods or models [62, 97, 103] were proposed to identify factors that in
uenced consumers'
response to mobile ads.
83
7.4 Study of User Reviews
Similar to the surveys conducted in the related work above, several studies have been carried out on
user reviews of mobile apps [53, 59, 83]. Palomba and colleagues [84] devised an approach, named
CRISTAL, for tracing informative crowd reviews to source code changes, and for monitoring
the extent to which developers accommodated crowd requests and follow-up user reactions as
re
ected on their ratings. AR-Miner [20] was presented as a novel computational framework for
app review mining, which performs comprehensive analytics from raw user reviews, to help mobile
app developers discover the most \informative" user reviews from a large and rapidly increasing
pool of user reviews. WisCom [30] was proposed to analyze a large corpus of user ratings and
reviews in mobile app markets at dierent levels of detail, such as: discovering inconsistencies in
reviews, identifying reasons why users like or dislike a given app. In these studies, the authors
manually or automatically examined the reviews to determine what users were complaining about
and whether the users were requesting features in the reviews. Sarro and colleagues [94] built
a prediction model by using feature and rating information from the existing apps. Based on
the model, customer rating reactions could be predicted purely using app features. Asiri and
a colleague [12] investigated users' experiences and attitudes towards mobile apps' reviews in a
recent study. They found that, besides the app's rating and download statistics, users tended to
use self-judgment to determine the apps' quality. Both studies tried to correlate users' reactions
with app reviews or ratings. Similar to their work, I also examined reviews in Chapter 4. However,
I focused on ad-related reviews instead of app-level reviews, and the relationship between mobile
ad aspects and app ratings.
7.5 Study of Ad Optimization
Other related work aims at the improvement of the current ad mechanisms in terms of the energy
consumption and other metrics. Mohan and colleagues [79] proposed an approach to support
ad prefetching with minimal changes to the existing advertising architecture, while reducing the
ad energy overhead with a negligible revenue loss. Vallina-Rodriguez and colleagues [104] im-
plemented a prototype with prefetching and caching techniques and showed an improvement on
Android devices in terms of both the energy consumption and network usage of ads. Khan and
colleagues [60] outlined the principles of a middleware that used predictive proling of a user's
context to anticipate the advertisements that needed to be served. They reported a non-negligible
84
amount of network trac generated by ads. Wang and colleagues [109] proposed an app advertis-
ing auction mechanism based on optimal keyword auctions. Their results showed the mechanism
could better guide app advertising.
7.6 Study of Ad Fraud
Related work has implemented tools or systems to detect some kinds of ad fraud. Liu and col-
leagues [73] designed DECAF, a scalable and eective system for discovering placement fraud in
mobile app advertisements. DECAF interacts with mobile apps through Monkey on Windows-
based mobile platforms. The goal of DECAF is to detect banner ad placement fraud on Windows
Phone apps. Hao and colleagues [50] proposed PUMA, a framework with the capability of de-
tecting some types of ad fraud, similar to DECAF. In particular, PUMA detected ad fraud by
comparing the WebView size with the minimum allowed ad size in mobile apps. PUMA was de-
signed for scalable and programmable UI exploration of Android apps and customized for various
types of dynamic analysis. Crussell and colleagues [25] implemented MAdFraud, an analysis tool
to expose two types of fraudulent ad behaviors in apps: requesting ads while the app is in the
background, and clicking on ads without user interaction. By analyzing HTTP requests, MAd-
Fraud was able to identify ad fraud. Gibler and colleagues [33] took a large-scale evaluation on
free apps from dierent Android markets and used it to characterize plagiarized apps and esti-
mate their impact on the original app developers. Through captured HTTP advertising trac,
they estimated a lower bound on the ad revenue that cloned apps siphoned from the original
developers, and the user base that was diverted from the original apps to clone apps. Alrwais and
colleagues [8] experimented with an ad fraud infrastructure to understand modalities of ad fraud
via misdirected human clicks, and presented a detailed account of the attackers' modus operandi.
They also studied the impact of this attack on real-world users.
Another group of related work proposed dierent methods or models to study ad fraud.
Mungamuru and colleagues [81] presented an economic model of the online advertising market.
The outcome of the model was to determine whether ad networks had an incentive to aggressively
combat ad fraud. Vratonjic and colleagues [107] provided a game-theoretical model to study the
strategic behavior of internet service providers and ad networks against botnet ad fraud. Their
analytical and numerical results showed that the optimal strategy depended on the ad revenue
loss of the ad networks due to ad fraud and the number of bots participating in ad fraud.
85
7.7 Study of Ad Security
Some other research has been done concerning the security and privacy aspects of mobile ads.
Shekhar and colleagues [95] proposed AdSplit, which guaranteed protection and integrity to adver-
tisers by separating mobile ads from applications. Pearce and colleagues [86] presented AdDroid
to separate privileged advertising functionality from host applications. Stevens and colleagues
[99] studied the eect of Android ad libraries on user privacy. Liu and colleagues [72] explored
ecient methods to de-escalate privileges for ad libraries in mobile apps, where the resource access
privileges for ad libraries could be dierent from that of the app logic. King and colleagues [61]
studied privacy concerns of proling mobile customers by behavioral advertisers. They identied
potential harms to privacy and personal data related to proling for behavioral advertising. Mo-
bile ads are also leveraged for analysis from dierent perspectives [26, 76, 89, 96], such as: the
detection of hidden attacks, the exploration of user data exposure, and the leakage of sensitive
user information. All these studies are complementary to mine in understanding and contributing
to a better mobile advertising ecosystem.
86
Chapter 8
Conclusion and Future Work
In this section, I conclude by giving a summary of my dissertation work and discuss directions of
future work.
8.1 Summary
Mobile applications have become an important part of our daily lives for the purpose of both
work and personal activities, such as chatting, gaming, and email. It is very convenient for us to
nd an app that we need based on the app category it belongs to. In the meanwhile, the \free
app" distribution model has been extremely popular with end users and developers. Developers
use mobile ads to generate revenue and cover the cost of developing these free apps. However, the
improper use of ads can also become a source of complaints and bad ratings for the app, which
can signicantly reduce the app's chance of success, and eventually cut into potential advertising
revenue that could be earned by developers. Despite the importance of ad usage in mobile apps,
no studies exist of how to assist software developers regarding mobile advertising. For example,
while there is evidence that users do not like ads, we do not know what the aspects of ads are
that users dislike nor if they dislike certain aspects of ads more than others. For another example,
there is very little guidance available for developers concerning how, where, and when to display
ads in their apps, though most advice in blogs or provided by Mobile Ad Network (MAN) tends
to be anecdotal, too general, or not supported by quantitative evidence.
To address these limitations, the goal of my research is to utilize user feedback to assist
software developers to better use ads in their apps. My dissertation work furthers this goal by
systematically investigating user feedback (in terms of both reviews and ratings), and based on
the extracted topics, developing various dynamic and static analyses to uncover the underlying
87
relationships between dierent ad aspects and app ratings. The hypothesis statement of my
dissertation is:
User feedback can be utilized to analyze the impact of dierent ad aspects on apps.
To evaluate the hypothesis of my dissertation, I designed a systematic approach that provides
developers a set of guidelines for ad implementation.
The rst piece of my dissertation work is an empirical study of mobile in-app ad reviews. In
this work, I investigated app reviews that users gave for apps in the app store that were about
ads. In particular, I manually examined a valid sample set of 400 ad reviews that were randomly
selected from a large corpus of user reviews to identify ad-related topics in a systematic way. The
goal was to identify what the aspects of mobile ads are that users care about. I found that most
ad reviews were complaints that were about UI-related topics and three topics were brought up
the most often: the frequency with which ads were displayed, the full screen size of pop-up ads,
and the location where ads were displayed. For non-UI ad functions, I found the most complained
about topics were: ads displayed in the so-called paid version of an app, and ads aecting the
running of app-level functionality.
The second piece of my dissertation work is to quantify non-UI-related ad aspects and analyze
their relationship with app ratings. To do it, I studied 21 real-world Android apps and found
that the cost of ads in terms of performance, energy, and bandwidth were substantial. The
median relative CPU, memory, energy, and bandwidth costs of ads were 56%, 22%, 15%, and 97%
respectively. I also found that complaints about these costs were signicant and could impact the
ratings given to an app.
The third piece of my dissertation work is to quantify UI-related ad aspects and analyze their
relationship with app ratings. To do it, I developed program analyses to quantify visual aspects
related to ad topics and then statistically analyzed whether there existed a relationship between
certain kinds of ad usage and the ratings assigned to the app by end users. From my ndings, I
distilled a set of recommendations to help app developers more eectively use ads in their apps.
Overall, I found that lower ratings were generally associated with apps that (1) used interstitial
ads instead of banner ads; (2) had ad frequency ratios close to 0.77; (3) used more than one ad
per activity; (4) made apps larger in height, closer to 366 pixels; (5) had ads appear in the middle
or bottom of the activity; (6) placed ads on the initial landing page of an app; or (7) had repeated
ad content.
In summary, my investigation has demonstrated that user feedback contains useful information
about dierent ad aspects that have an impact on app ratings, thereby conrming the hypothesis
88
of my dissertation. To the best of my knowledge, my research is the rst eort to identify ad-
related topics from user reviews through a systematic approach, and the rst to quantify the most
common ad-related topics and analyze their impact on app ratings.
8.2 Future Work
In the future, studying ad aspects to improve ad implementation will continue to be an important
challenge for developers not just in the domain of mobile applications but other domains as well,
such as web applications. My dissertation identies several real-world challenges in the domain
of mobile advertising that motivate dierent directions of research and also lay the foundation for
developing automated techniques that can quantify dierent ad aspects and determine the impact
of these aspects on app ratings.
The usage of ads in mobile apps represents a tradeo. On the one hand, developers want to
insert more ads to increase revenue, but, on the other hand, they want to avoid degrading the
user experience to the point that their apps' success could be jeopardized. My work has focused
on one aspect of this tradeo problem { namely, what types of ad usage are more likely to lead
to lower ratings from users. Within this focus, future work can reuse my methodology to explore
the impact of additional kinds of ad usage. More broadly, my work has laid the basis for future
work that can also explore the other aspects of this tradeo, such as the relationship among ad
usage patterns, end user interaction with the ads, and ad revenue. Results in these areas would
help developers more fully to understand how to best make use of ads while also improving the
user experience.
Another interesting avenue of future work would be to explore the causal relationship between
many of the correlations I identied. For example, why do ads in certain positions typically lead
to lower ratings or why repeated ad content seems to bother users so much. Such work could
examine these relationships from a user interface perspective or psychology aspect. Establishing
the causal relationships could lead to a renement of the guidelines and/or new understanding of
factors to consider when designing apps with ads.
89
References
[1] http://asm.ow2.org/.
[2] http://code.google.com/p/dex2jar/.
[3] https://play.google.com/store/apps/details?id=lv.n3o.shark.
[4] http://www.att.com/shop/wireless/data-plans.html#fbid=EhuxYcdIz02.
[5] http://www.distimo.com/leaderboards/google-play-store/united-states/top-
overall/free/month.
[6] Driven by Facebook and Google, Mobile Ad Market Soars 105% in 2013, March 2014.
[7] Bram Adams, Zhen Ming Jiang, and Ahmed E Hassan. Identifying crosscutting concerns
using historical code changes. In Proceedings of the 32nd ACM/IEEE International Con-
ference on Software Engineering-Volume 1, pages 305{314. ACM, 2010.
[8] Sumayah A Alrwais, Alexandre Gerber, Christopher W Dunn, Oliver Spatscheck, Minaxi
Gupta, and Eric Osterweil. Dissecting ghost clicks: Ad fraud via misdirected human clicks.
In Proceedings of the 28th Annual Computer Security Applications Conference, pages 21{30.
ACM, 2012.
[9] App Annie. Google Play Top App Charts. https://www.appannie.com/apps/
google-play/top-chart/?country=US&category=1&feed=Free, 2017.
[10] AOL. Best Practices for Monetizing Mobile Applications, 2018.
[11] AppBrain. Android Ad networks, 2018.
[12] Omar Asiri and Carl K Chang. Investigating users experiences and attitudes towards mobile
apps reviews. In International Conference on Human-Computer Interaction, pages 481{499.
Springer, 2018.
[13] Thomas Ball and James R Larus. Ecient Path Proling. In Proceedings of the 29th annual
ACM/IEEE international symposium on Microarchitecture, pages 46{57. IEEE Computer
Society, 1996.
[14] Francis Bea. Why developers use interstitial ads to monetize iOS and Android apps, Septem-
ber 2013.
[15] Ravi Bhoraskar, Seungyeop Han, Jinseong Jeon, Tanzirul Azim, Shuo Chen, Jaeyeon Jung,
Suman Nath, Rui Wang, and David Wetherall. Brahmastra: Driving apps to test the secu-
rity of third-party components. In 23rd USENIX Security Symposium (USENIX Security
14), pages 1021{1036. USENIX Association, 2014.
90
[16] Carlos Flavi an Blanco, Miguel Guinal u Blasco, and Isabel Iguacel Azor n. Entertainment
and informativeness as precursory factors of successful mobile advertising messages. Com-
munications of the IBIMA, 2010(130147):1{11, 2010.
[17] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of
machine Learning research, 3(Jan):993{1022, 2003.
[18] Theodore Book, Adam Pridgen, and Dan S Wallach. Longitudinal Analysis of Android Ad
Library Permissions. arXiv preprint arXiv:1303.0857, 2013.
[19] John Brown. AdMob Banner Ad Implementation Guidance , 2015.
[20] Ning Chen, Jialiu Lin, Steven CH Hoi, Xiaokui Xiao, and Boshen Zhang. AR-miner: mining
informative reviews for developers from mobile app marketplace. In Proceedings of the 36th
International Conference on Software Engineering, pages 767{778. ACM, 2014.
[21] Jacob Cohen. Statistical power analysis for the behavioral sciences 2nd edn, 1988.
[22] Forrester Consulting. The Value Of Rewarded Advertising. Rewarded Ad Formats Boost
Consumer Receptivity To In-App Ads, May 2014.
[23] J. P. Morgan Consulting. IAB Interactive Advertising Outlook 2014, January 2014.
[24] Ovum Consulting. Marketer Perceptions of Mobile Advertising - 2013, September 2013.
[25] Jonathan Crussell, Ryan Stevens, and Hao Chen. Madfraud: Investigating ad fraud in
android applications. In Proceedings of the 12th annual international conference on Mobile
systems, applications, and services, pages 123{134. ACM, 2014.
[26] Soteris Demetriou, Whitney Merrill, Wei Yang, Aston Zhang, and Carl A Gunter. Free for
All! Assessing User Data Exposure to Advertising Libraries on Android. In NDSS, 2016.
[27] Anthony Finkelstein, Mark Harman, Yue Jia, William Martin, Federica Sarro, and
Yuanyuan Zhang. App store analysis: Mining app stores for relationships between cus-
tomer, business and technical characteristics. RN, 14(10), 2014.
[28] Forbes. Mobile Advertising Will Drive 75% Of All Digital Ad Spend In 2018: Here's What's
Changing, February 2018.
[29] Jane Forman and Laura Damschroder. Qualitative content analysis. In Empirical methods
for bioethics: A primer, pages 39{62. Emerald Group Publishing Limited, 2007.
[30] Bin Fu, Jialiu Lin, Lei Li, Christos Faloutsos, Jason Hong, and Norman Sadeh. Why people
hate your app: Making sense of user feedback in a mobile app store. In Proceedings of
the 19th ACM SIGKDD international conference on Knowledge discovery and data mining,
pages 1276{1284. ACM, 2013.
[31] Zheng Gao, Christian Bird, and Earl T Barr. To type or not to type: quantifying de-
tectable bugs in javascript. In Proceedings of the 39th International Conference on Software
Engineering (ICSE), pages 758{769. IEEE Press, 2017.
[32] Inc. Gartner. Gartner Says Mobile Advertising Spending Will Reach 18 Billion in 2014,
January 2014.
[33] Clint Gibler, Ryan Stevens, Jonathan Crussell, Hao Chen, Hui Zang, and Heesook Choi.
Adrob: Examining the landscape and impact of android application plagiarism. In Pro-
ceeding of the 11th annual international conference on Mobile systems, applications, and
services, pages 431{444. ACM, 2013.
91
[34] Malcolm Gladwell. Blink: The power of thinking without thinking. Back Bay Books, 2007.
[35] Lorenzo Gomez, Iulian Neamtiu, Tanzirul Azim, and Todd Millstein. RERAN: Timing- and
Touch-Sensitive Record and Replay for Android. In Proceedings of the 35th International
Conference on Software Engineering (ICSE), pages 72{81, 2013.
[36] Google. AdMob, 2018.
[37] Google. Cost-per-click (CPC), 2018.
[38] Google. Cost-per-thousand-impressions (CPM), 2018.
[39] Google. Implementation Guidance, 2018.
[40] Google. Testing Support Library, 2018.
[41] Michael C Grace, Wu Zhou, Xuxian Jiang, and Ahmad-Reza Sadeghi. Unsafe exposure
analysis of mobile in-app advertisements. In Proceedings of the fth ACM conference on
Security and Privacy in Wireless and Mobile Networks, pages 101{112. ACM, 2012.
[42] Jiaping Gui, Seyed Hossein Alavi, Ding Li, Meiyappan Nagappan, and William G. J. Hal-
fond. Understanding the Relationship Between Mobile Ad Usage and App Ratings in An-
droid Apps. In submission, 2018.
[43] Jiaping Gui, Ding Li, Mian Wan, and William GJ Halfond. Lightweight measurement
and estimation of mobile ad energy consumption. In Proceedings of the 5th International
Workshop on Green and Sustainable Software (GREENS), pages 1{7. ACM, 2016.
[44] Jiaping Gui, Stuart Mcilroy, Meiyappan Nagappan, and William G. J. Halfond. Truth in
Advertising: The Hidden Cost of Mobile Ads for Software Developers. In Proceedings of the
37th International Conference on Software Engineering (ICSE), May 2015.
[45] Jiaping Gui, Meiyappan Nagappan, and William GJ Halfond. What Aspects of Mobile Ads
Do Users Care About? An Empirical Study of Mobile In-app Ad Reviews. in submission,
2018.
[46] Kilem L Gwet. Handbook of inter-rater reliability: The denitive guide to measuring the
extent of agreement among raters. Advanced Analytics, LLC, 2014.
[47] Michael Hanley and Robert E Boostrom Jr. How the Smartphone is Changing College
Student Mobile Content Usage and Advertising Acceptance: An IMC Perspective. Interna-
tional Journal of Integrated Marketing Communications, 3(2), 2011.
[48] Shuai Hao, Ding Li, William G. J. Halfond, and Ramesh Govindan. Estimating Mobile
Application Energy Consumption using Program Analysis. In Proceedings of the 35th In-
ternational Conference on Software Engineering (ICSE), May 2013.
[49] Shuai Hao, Ding Li, William G. J. Halfond, and Ramesh Govindan. SIF: A Selective In-
strumentation Framework for Mobile Applications. In Proceedings of the 11th International
Conference on Mobile Systems, Applications and Services (MobiSys), June 2013.
[50] Shuai Hao, Bin Liu, Suman Nath, William GJ Halfond, and Ramesh Govindan. PUMA:
Programmable UI-Automation for Large Scale Dynamic Analysis of Mobile Apps. In Pro-
ceedings of the 12th annual international conference on Mobile systems, applications, and
services (MobiSys), pages 204{217. ACM, 2014.
92
[51] M. Harman, Yue Jia, and Yuanyuan Zhang. App Store Mining and Analysis: MSR for App
Stores. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories
(MSR), pages 108{111, June 2012.
[52] John A Hartigan and Manchek A Wong. Algorithm AS 136: A k-means clustering algorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100{108, 1979.
[53] Claudia Iacob and Rachel Harrison. Retrieving and analyzing mobile apps feature requests
from online reviews. In Mining Software Repositories (MSR), 2013 10th IEEE Working
Conference on, pages 41{44. IEEE, 2013.
[54] QSR International. How should the value of Kappa be interpreted? http://help-nv10.
qsrinternational.com/desktop/procedures/run_a_coding_comparison_query.htm,
2018.
[55] QSR International. NVivo. https://www.qsrinternational.com/nvivo/home, 2018.
[56] Frank E Harrell Jr. Package `Hmisc', 2016.
[57] H. Khalid, M. Nagappan, and A. Hassan. Examining the Relationship between FindBugs
Warnings and End User Ratings: A Case Study On 10,000 Android Apps. Software, IEEE,
PP, 2015.
[58] H. Khalid, M. Nagappan, E. Shihab, and A.E. Hassan. Prioritizing the Devices to Test Your
App on: A Case Study of Android Game Apps. In Proceedings of the 22nd ACM SIGSOFT
International Symposium on the Foundations of Software Engineering. ACM, 2014.
[59] H. Khalid, E. Shihab, M. Nagappan, and A.E. Hassan. What Do Mobile App Users Com-
plain About? A Study on Free iOS Apps. Software, IEEE, 2014.
[60] Azeem J Khan, Vigneshwaran Subbaraju, Archan Misra, and Srinivasan Seshan. Mitigating
the True Cost of Advertisement- Supported Free Mobile Applications. In Proceedings of
the Twelfth Workshop on Mobile Computing Systems & Applications, page 1. ACM, 2012.
[61] Nancy J King and Pernille Wegener Jessen. Proling the mobile customer{Privacy con-
cerns when behavioural advertisers target mobile phones{Part I. Computer Law & Security
Review, 26(5):455{478, 2010.
[62] Matti Leppaniemi and Heikki Karjaluoto. Factors in
uencing consumers' willingness to
accept mobile advertising: a conceptual model. International Journal of Mobile Communi-
cations, 3(3):197{213, 2005.
[63] Ding Li and William GJ Halfond. An Investigation into Energy-Saving Programming Prac-
tices for Android Smartphone App Development. In Proceedings of the 3rd International
Workshop on Green and Sustainable Software, pages 46{53. ACM, 2014.
[64] Ding Li, Shuai Hao, Jiaping Gui, and William G.J. Halfond. An Empirical Study of the En-
ergy Consumption of Android Applications. In Proceedings of the International Conference
on Software Maintenance and Evolution (ICSME), September 2014.
[65] Ding Li, Shuai Hao, William G.J. Halfond, and Ramesh Govindan. Calculating Source Line
Level Energy Information for Android Applications. In Proceedings of the International
Symposium on Software Testing and Analysis (ISSTA), July 2013.
[66] Ding Li, Yingjun Lyu, Jiaping Gui, and William G.J. Halfond. Automated Energy Opti-
mization of HTTP Requests for Mobile Applications. In Proceedings of the 38th Interna-
tional Conference on Software Engineering (ICSE), May 2016.
93
[67] Ding Li, Angelica Huyen Tran, and William GJ Halfond. Making Web Applications More
Energy Ecient for OLED Smartphones. In Proceedings of the 36th International Confer-
ence on Software Engineering (ICSE), pages 527{538. ACM, 2014.
[68] Li Li, Jacques Klein, Yves Le Traon, et al. An investigation into the use of common
libraries in android apps. In 2016 IEEE 23rd International Conference on Software Analysis,
Evolution, and Reengineering (SANER), volume 1, pages 403{414. IEEE, 2016.
[69] Mario Linares-V asquez, Gabriele Bavota, Carlos Bernal-C ardenas, Massimiliano Di Penta,
Rocco Oliveto, and Denys Poshyvanyk. API Change and Fault Proneness: A Threat to the
Success of Android Apps. In Proceedings of the 2013 9th Joint Meeting on Foundations of
Software Engineering, ESEC/FSE 2013, pages 477{487, 2013.
[70] Mario Linares-V asquez, Gabriele Bavota, Carlos Bernal-C ardenas, Rocco Oliveto, Massimil-
iano Di Penta, and Denys Poshyvanyk. Mining energy-greedy api usage patterns in android
apps: an empirical study. In Proceedings of the 11th Working Conference on Mining Soft-
ware Repositories, pages 2{11. ACM, 2014.
[71] Gitte Lindgaard, Gary Fernandes, Cathy Dudek, and Judith Brown. Attention web design-
ers: You have 50 milliseconds to make a good rst impression! Behaviour & information
technology, 25(2):115{126, 2006.
[72] Bin Liu, Bin Liu, Hongxia Jin, and Ramesh Govindan. Ecient privilege de-escalation for
ad libraries in mobile apps. In Proceedings of the 13th Annual International Conference on
Mobile Systems, Applications, and Services, pages 89{103. ACM, 2015.
[73] Bin Liu, Suman Nath, Ramesh Govindan, and Jie Liu. DECAF: Detecting and Charac-
terizing Ad Fraud in Mobile Apps. In Proceedings of the 11th USENIX Symposium on
Networked Systems Design and Implementation (NSDI), April 2014.
[74] Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E Hassan. Analyzing and auto-
matically labelling the types of user issues that are raised in mobile app reviews. Empirical
Software Engineering, 21(3):1067{1106, 2016.
[75] Teo Meijun. Conducting Focus Group Discussion GOOD PRACTICES. https://www.
slideshare.net/teomeijun1/fgd-guidelines, 2013.
[76] Wei Meng, Ren Ding, Simon P Chung, Steven Han, and Wenke Lee. The Price of Free:
Privacy Leakage in Personalized Mobile In-Apps Ads. In NDSS, 2016.
[77] Roberto Minelli. Software analytics for mobile applications. Roberto Minelli, 2012.
[78] MMA. Best Practices, 2018.
[79] Prashanth Mohan, Suman Nath, and Oriana Riva. Prefetching mobile ads: Can advertising
systems aord it? In Proceedings of the 8th ACM European Conference on Computer
Systems, pages 267{280. ACM, 2013.
[80] I Mojica Ruiz, Meiyappan Nagappan, Bram Adams, Thorsten Berger, Steen Dienst, and
Ahmed Hassan. Impact of Ad Libraries on Ratings of Android Mobile Apps. 2014.
[81] Bob Mungamuru and Stephen Weis. Competition and fraud in online advertising markets.
In International Conference on Financial Cryptography and Data Security, pages 187{191.
Springer, 2008.
94
[82] Alessandro Murgia, Parastou Tourani, Bram Adams, and Marco Ortu. Do developers feel
emotions? an exploratory analysis of emotions in software artifacts. In Proceedings of the
11th working conference on mining software repositories, pages 262{271. ACM, 2014.
[83] Dennis Pagano and Walid Maalej. User feedback in the appstore: An empirical study. In
2013 21st IEEE international requirements engineering conference (RE), pages 125{134.
IEEE, 2013.
[84] Fabio Palomba, Mario Linares-V asquez, Gabriele Bavota, Rocco Oliveto, Massimiliano
Di Penta, Denys Poshyvanyk, and Andrea De Lucia. User reviews matter! tracking crowd-
sourced reviews to support evolution of successful apps. In Software Maintenance and
Evolution (ICSME), 2015 IEEE International Conference on, pages 291{300. IEEE, 2015.
[85] Abhinav Pathak, Y Charlie Hu, and Ming Zhang. Where is the energy spent inside my
app? Fine Grained Energy Accounting on Smartphones with Eprof. In Proceedings of the
7th ACM european conference on Computer Systems, pages 29{42. ACM, 2012.
[86] Paul Pearce, Adrienne Porter Felt, Gabriel Nunez, and David Wagner. Addroid: Privilege
separation for applications and advertisers in android. In Proceedings of the 7th ACM
Symposium on Information, Computer and Communications Security, pages 71{72. ACM,
2012.
[87] PricewaterhouseCoopers. At $11.6 Billion in Q1 2014, Internet Advertising Revenues Hit
All-Time First Quarter High, June 2014.
[88] PricewaterhouseCoopers. 2014 Internet Advertising Revenue Full-Year Report, April 2015.
[89] Vaibhav Rastogi, Rui Shao, Yan Chen, Xiang Pan, Shihong Zou, and Ryan Riley. Detecting
Hidden Attacks through the Mobile App-Web Interfaces. In 2016 Network and Distributed
System Security Symposium (NDSS). The Internet, 2016.
[90] Israel J Mojica Ruiz, Meiyappan Nagappan, Bram Adams, Thorsten Berger, Steen Dienst,
and Ahmed E Hassan. On ad library updates in Android apps. IEEE Software, 2014.
[91] Israel Mojica Ruiz, Meiyappan Nagappan, Bram Adams, Thorsten Berger, Steen Dienst,
and Ahmed Hassan. On the Relationship between the Number of Ad Libraries in an Android
App and its Rating. IEEE Software, 99(1), 2014.
[92] Frank E Saal, Ronald G Downey, and Mary A Lahey. Rating the ratings: Assessing the
psychometric quality of rating data. Psychological Bulletin, 88(2):413, 1980.
[93] Khalid Saleh. Global Mobile Ad Spending - Statistics and Trends, 2015.
[94] Federica Sarro, Mark Harman, Yue Jia, and Yuanyuan Zhang. Customer rating reactions
can be predicted purely using app features. In 2018 IEEE 26th International Requirements
Engineering Conference (RE), pages 76{87. IEEE, 2018.
[95] Shashi Shekhar, Michael Dietz, and Dan S Wallach. AdSplit: Separating Smartphone
Advertising from Applications. In USENIX Security Symposium, pages 553{567, 2012.
[96] Sooel Son, Daehyeok Kim, and Vitaly Shmatikov. What Mobile Ads Know About Mobile
Users. In NDSS, 2016.
[97] Sandra Soroa-Koury and Kenneth CC Yang. Factors aecting consumers responses to
mobile advertising from a social norm theoretical perspective. Telematics and Informatics,
27(1):103{113, 2010.
95
[98] Statista. Number of apps available in leading app stores as
of March 2017. https://www.statista.com/statistics/276623/
number-of-apps-available-in-leading-app-stores/, 2017.
[99] Ryan Stevens, Clint Gibler, Jon Crussell, Jeremy Erickson, and Hao Chen. Investigating
user privacy in android ad libraries. In Workshop on Mobile Security Technologies (MoST),
page 10. Citeseer, 2012.
[100] Mark D Syer, Bram Adams, Ying Zou, and Ahmed E Hassan. Exploring the development of
micro-apps: A case study on the blackberry and android platforms. In Source Code Analysis
and Manipulation (SCAM), 2011 11th IEEE International Working Conference on, pages
55{64. IEEE, 2011.
[101] Mark D Syer, Meiyappan Nagappan, Bram Adams, and Ahmed E Hassan. Studying the
relationship between source code quality and mobile platform dependence. Software Quality
Journal, 23(3):485{508, 2015.
[102] Christoph Treude and Martin P Robillard. Augmenting api documentation with insights
from stack over
ow. In Software Engineering (ICSE), 2016 IEEE/ACM 38th International
Conference on, pages 392{403. IEEE, 2016.
[103] Melody M Tsang, Shu-Chun Ho, and Ting-Peng Liang. Consumer Attitudes toward Mobile
Advertising: An Empirical Study. International Journal of Electronic Commerce, 8(3):65{
78, 2004.
[104] Narseo Vallina-Rodriguez, Jay Shah, Alessandro Finamore, Yan Grunenberger, Konstantina
Papagiannaki, Hamed Haddadi, and Jon Crowcroft. Breaking for Commercials: Character-
izing Mobile Advertising. In Proceedings of the 2012 ACM conference on Internet measure-
ment conference, pages 343{356. ACM, 2012.
[105] Paul Van Schaik and Jonathan Ling. The role of context in perceptions of the aesthetics
of web pages over time. International Journal of Human-Computer Studies, 67(1):79{89,
2009.
[106] Suzanne Vranica and Christopher S. Stewart. Mobile Advertising Begins to Take O,
October 2013.
[107] Nevena Vratonjic, Mohammad Hossein Manshaei, Maxim Raya, and Jean-Pierre Hubaux.
ISPs and ad networks against botnet ad fraud. In International Conference on Decision
and Game Theory for Security, pages 149{167. Springer, 2010.
[108] Mian Wan, Yuchen Jin, Ding Li, Jiaping Gui, Sonal Mahajan, and William GJ Halfond.
Detecting display energy hotspots in Android apps. Software Testing, Verication and
Reliability, 27(6), 2017.
[109] Meng Wang, Zhide Chen, Li Xu, and Lei Shao. Optimal mobile App advertising keyword
auction model with variable costs. In Communications and Networking in China (CHINA-
COM), 2013 8th International ICST Conference on, pages 166{171. IEEE, 2013.
[110] Xuetao Wei, Lorenzo Gomez, Iulian Neamtiu, and Michalis Faloutsos. ProleDroid: Multi-
layer Proling of Android Applications. In Proceedings of the 18th annual international
conference on Mobile computing and networking, pages 137{148. ACM, 2012.
[111] WordStream. AdWords Mobile: Mobile PPC Basics & Best Practices, 2018.
96
[112] David Jingjun Xu. The in
uence of personalization in aecting consumer attitudes toward
mobile advertising in China. Journal of Computer Information Systems, 47(2):9{19, 2006.
[113] Dilip Yadav. Best Ways To Optimize Mobile App To Rank Higher In Google Play
Store. https://www.shoutmeloud.com/google-play-store-oprimization-ranking.
html, 2017.
97
Abstract (if available)
Abstract
In the mobile app ecosystem, developers receive ad revenue by placing ads in their apps and releasing them for free. While there is evidence that users do not like ads, we do not know the aspects of ads that users dislike nor if they dislike certain aspects of ads more than others. Therefore, in the first part of this dissertation work, I analyzed the different ad-related topics in app reviews from users. In order to do this, I investigated user reviews of apps that contained complaints about ads (ad reviews). I examined a sample set of 400 ad reviews randomly selected to identify ad-related topics in a systematic way. I found that most ad complaints were about UI-related topics and the three topics discussed predominantly were: the frequency with which ads were displayed, the timing of when ads were displayed, and the location of the displayed ads. I also found users reviewed non-UI aspects of mobile advertising, such as ad blocking or slowing down the host app’s operation. My results provide actionable information to software developers regarding the aspects of ads that are most likely to be complained about by users in their reviews. ❧ The results of the above work indicate that although the apps are ostensibly free, they do, in fact, come with costs. To analyze ad costs related to non-UI aspects, I designed a systematic approach to study 21 real-world Android apps. The results showed that the use of ads led to the consumption of significantly more network data, increased energy consumption, and repeated changes to ad-related code. I also found that complaints about these costs were significant and could affect the rating (on a scale of one to five stars) given to an app. My results provide actionable information and guidance to software developers in weighing the tradeoffs of incorporating ads into their mobile apps. ❧ In the third part of my dissertation work, I systematically investigated UI aspects of mobile ads. My prior results showed the improper use of ads could become a source of complaints and bad ratings for the app. Hence, developers must know how, where, and when to display ads in their apps. Unfortunately, very little guidance is available for developers, and most advice tends to be anecdotal, too general, or not supported by quantitative evidence. To address this, I investigated UI-related ad topics, which my prior work had identified as the most common type of ad complaint. To carry out this investigation, I developed analyses to quantify aspects related to these UI topics and then analyzed whether there existed a relationship between these values and the ratings assigned to the app by end users. I found lower ratings (with statistical significance) were generally associated with apps that had different visual patterns regarding ad implementation, such as ads in the middle or at the bottom of the page, and ads on the initial landing page of an app. Based on the results, I created a set of guidelines to help app developers more effectively use ads in their apps.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Detecting SQL antipatterns in mobile applications
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Reducing user-perceived latency in mobile applications via prefetching and caching
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Automated repair of layout accessibility issues in mobile applications
PDF
Energy optimization of mobile applications
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
Techniques for methodically exploring software development alternatives
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Proactive detection of higher-order software design conflicts
PDF
Constraint-based program analysis for concurrent software
PDF
User-interface considerations for mobility feedback in a wearable visual aid
PDF
Learning logical abstractions from sequential data
PDF
Mixed methods investigation of user engagement with a smoking cessation app
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
The effects of required security on software development effort
PDF
A mobile app for anxiety: an examination of efficacy and user perceptions
PDF
Modeling social and cognitive aspects of user behavior in social media
Asset Metadata
Creator
Gui, Jiaping
(author)
Core Title
Utilizing user feedback to assist software developers to better use mobile ads in apps
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/25/2019
Defense Date
03/05/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
mobile advertising,mobile applications,OAI-PMH Harvest,user reviews
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Halfond, William G. J. (
committee chair
), Bogdan, Paul (
committee member
), Deshmukh, Jyotirmoy Vinay (
committee member
), Medvidovic, Nenad (
committee member
), Wang, Chao (
committee member
)
Creator Email
guimaocai@gmail.com,jgui@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-143133
Unique identifier
UC11675780
Identifier
etd-GuiJiaping-7241.pdf (filename),usctheses-c89-143133 (legacy record id)
Legacy Identifier
etd-GuiJiaping-7241.pdf
Dmrecord
143133
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Gui, Jiaping
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
mobile advertising
mobile applications
user reviews