Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 927 (2012)
(USC DC Other)
USC Computer Science Technical Reports, no. 927 (2012)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TheCaseforComplexityPredictioninAutomatic
PartitioningofCloud-enabledMobileApplications
Luis D. Pedrosa, Nupur Kothari,
Ramesh Govindan
Embedded Networks Laboratory
University of Southern California
{luis.pedrosa, nkothari,
ramesh}@usc.edu
Jeff Vaughan, Todd Millstein
Computer Science Department
University of California, Los Angeles
{jeff, todd}@cs.ucla.edu
ABSTRACT
As application demands out-pace the evolution of bat-
tery technology, many smartphone “app” developers will
soon explore offloading compute-intensive tasks to the
cloud. Such cloud-enabled mobile applications effec-
tively partition application functionality between the phone
and the cloud. Application partitioning must be dynamic,
to successfully adapt to variability in resource availabil-
ity. Dynamic partitioning systems rely on the ability to
predict an application’s component’s resource usage, for
which prior work has used simple approaches. In this
paper we propose the use of complexity metrics that
enhance these predictions by taking into account rele-
vant properties of each component’s input, both general-
purpose (e.g., size) and type-specific (e.g., number of
words in an audio sample). Our predictors improve the
energy efficiency of partitioning a speech recognition li-
brary by 21% or more.
1. INTRODUCTION
As smartphones gain popularity, mobile applications,
or “apps”, are rapidly becoming the cornerstone of what
defines the user’s experience. Already, there are apps
available for various phone platforms that provide useful
capabilities like speech recognition, restaurant recom-
mendations, automatic logging of workouts, etc. Many
of these capabilities place large demands on smartphone
This research was supported by the U.S. National Science
Foundation award No. 1048824
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
University of Southern California, Computer Science Technical Report,
June 2012, Los Angeles, USA.
Copyright 2012.
resources. Thus, developing these applications requires
achieving a delicate balance that ensures application us-
ability without significantly degrading battery lifetime.
To address this challenge, many developers have started
using cloud resources to run compute-intensive applica-
tions or libraries. This requires significant programming
effort, since developers now have to manually reason,
for every application, which applicationcomponents run
on the phone and which on the cloud. Moreover, in
many cases, it is not clear if the same partitioning strat-
egy for a given app will work for all network conditions
and inputs. Thus, there is a need for a dynamic offload-
ing strategy that takes into account the cost of executing
a component on the phone vs. the cloud, and attempts
to reduce energy consumption without significantly im-
pacting application performance.
Any such offloading strategy must manage complex
tradeoffs. For example, moving a computation to the
cloud reduces CPU energy consumption but increases
energy use by the networking hardware. In general, the
relative costs and benefits of offloading may depend on
hardware (howexpensiveis3Gonthisphone?), context
(is WiFi available?), and workload (how large is the in-
putimage?).
Prior work has designed execution frameworks to
help with the problem of partitioning application com-
ponents between the phone and the cloud [1, 2, 6]. Some
of these [1, 2] use runtime profiling to predict the re-
source usage of a component and hence the cost of exe-
cuting it on the phone or the cloud, and base their parti-
tioning decision on this. While this is a great first step,
these systems use simple predictors, like the average
computation cost of a piece of code across all inputs
seen so far. These predictors can result in suboptimal
performance since the computational resource usage of
many components can depend on the input. For exam-
ple, the resource usage of a speech-recognition system
can depend on input complexity metrics, such as input
size, or the number of distinct words in the audio sam-
0
5
10
15
20
25
1 2 3 4 5 6 7 8 9
Execution Time (s)
Component #
Small Input
Large Input
Figure2: ComponentExecutionTimes
0
500
1000
1500
2000
In-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-Out
Average Size (kBytes)
Component #s
Small Input
Large Input
Figure3: Inter-componentDataTransferSizes
ple.
In this paper, we propose to use predictors based on
input complexity metrics to improve the performance of
application partitioning decisions. We motivate the need
for input complexity metrics and dynamic partitioning,
using an example in Section 2. We present a frame-
work for resource usage prediction that can be easily ex-
tended by users to incorporate new application-specific
input complexity metrics (Section 3). We combine this
with a simple partitioning framework to build a proof-
of-concept remote execution system. We demonstrate
that for a speech recognition library, our framework im-
proves performance even with the simple, application-
agnostic complexity metric of input size (Section 4).
2. MOTIVATINGEXAMPLE
We motivate the need for a dynamic partitioning scheme
for smartphone applications that takes into account in-
put complexity, using a real world example. We present
here a brief experiment using a speech recognition li-
brary based on a port of CMU Sphinx4 [7] to Android.
Internally, the speech recognition system can be broken
into a linear pipeline of components, as illustrated in
Figure 1. Figures 2 and 3 respectively show run times
for each of these components, as well as the size of the
data transferred between them, when executed for two
audio samples of different sizes.
These figures show that the execution time and data
transfer size varies greatly with the inputs. For both
these inputs, while component 9, the Recognizer, is a
clear candidate for off-loading, the optimal point to par-
tition the application is not immediately clear. If the
phone were close to a WiFi access-point with great re-
ception (and consequently a low per-bit energy cost), it
may very well be worth sending the raw audio data to
the server where it can be processed almost instantly.
If, on the other hand, the phone were using the cellu-
lar network with very poor reception, a more intelligent
strategy might be to extract the audio features on the
phone and leave only the final recognition to run in the
cloud. Finally, this decision of where to partition might
vary with the input, since the exact trade-offs might be
different for different audio samples. As shown in the
figures, different inputs can vary considerably in the per-
component processing times and data-transfer sizes they
incur.
Hence we can see that, given the significant variabil-
ity one can expect to encounter at runtime due to var-
ious network conditions and different inputs, no single
static partitioning can be optimal. Furthermore, since
the decision of where to execute a component must be
made before its execution, its resource usage must be
predicted in order to compare the respective costs of ex-
ecuting it locally vs. transferring the data for remote ex-
ecution. Since inputs have a significant impact on these
costs, these predictors must be a function of the input:
in our approach, they depend on pre-determined input
complexity metrics.
3. DESIGNANDIMPLEMENTATION
In this section, we present the details of a prediction
system that uses a vector of input complexity metrics
to predict resource usage of various application com-
ponents. We also describe briefly our implementation
of a basic remote execution framework that we use to
demonstrate the utility of this prediction system.
3.1 PredictionSystem
The goal of the prediction system is to predict com-
ponent resource usage, so that the remote execution frame-
work can make informed partitioning decisions. An im-
portant design goal is to ensure that the system is modu-
lar, and easily extensible to new user-defined or application-
specific input complexity metrics.
We represent input complexity as a vectorC whose
components describe metrics. We model a component’s
resource use by an polynomial P(C), and for simplicity
assumeP(C)=kC wherek is a vector of constants. Co-
efficient vector k is learned automatically using test in-
puts during an initial training phase. In practice, both the
complexity metrics and the test inputs would be speci-
fied by the user. We believe this does not represent a
large burden on developers, since the metrics can likely
be defined on a type-specific basis (e.g., audio, video,
etc.) and shared across many components that accept
1: Audio Source 2: Preemphasizer 3: Windower 4: FFT 5: Filter Bank 6: DCT 7: CMN 8: Feature Extraction 9: Recognizer
Figure1: Sphinx4ComponentDiagram
the same type of data as input. Furthermore, developers
already test and debug their applications on test inputs.
To train or use predictors, a developer must specify
a metric function M that computes complexity metric
scores from inputs. Ideally, metric functions should be
lightweight, and the developer may need to balance met-
ric utility with overhead. The predicted cost of running
a component is naturally computed by
cost =P(M(input)):
Defining M and P independently allows application
developers to create type-specific complexity metrics that
extract any relevant information from inputs, while also
allowing any algorithm-specific dependencies and platform-
specific resource costs to be captured in the polynomial
coefficients learned during the training phase. Further-
more, type-agnostic input complexity metrics, such as
input size and entropy, can also be used for opaque data
types where the developer has not provided an otherwise
more relevant metric. We have implemented this pre-
diction system within a simple remote execution frame-
work that we describe next.
3.2 RemoteExecutionFramework
To aid in testing the utility of input complexity met-
rics for building efficient cloud-enabled smartphone ap-
plications, we have developed a basic remote execution
framework that allows for dynamic partitioning. While
this framework, in itself, is not the focus of this paper, a
brief explanation of how it operates is given for con-
text. This framework, based on message passing be-
tween state-less components, can be used to develop
both self-contained libraries, as well as entire applica-
tions. The main focus of this effort is to allow function-
ality to be implemented in a platform agnostic fashion,
such that it can be automatically and transparently mi-
grated between the phone and the cloud. Given such
a system, the developer needs only to implement the
library or application as a set of interconnected com-
ponents and the framework runtime automatically de-
cides how to best partition its execution and transpar-
ently marshals data and executes the appropriate com-
ponent where it is needed. In its current implementa-
tion, both the framework and the components are imple-
mented in Java and the runtime runs on a standard J2SE
virtual machine on the server and in the Dalvik virtual
machine on the Android smartphone platform.
The remote execution framework uses a simple al-
gorithm to make partitioning decisions. The purpose
of this algorithm is to find a partition that globally op-
timizes the application’s total cost according to some
user specified metric (run time or energy), taking into
account both the cost of locally executed components
and the cost of marshaling data to and from the cloud
for remotely executed components as computed by the
prediction system. In our framework, this algorithm is
implemented online, in the sense that the partitioning
decision can be revised after executing each component,
using the new information produced to provide more ac-
curate predictions.
Although there are multiple possible partitioning al-
gorithms, depending on the component graph structure
and the kind of predictions that are to be relied upon, we
use a simple greedy approach that at every point in time
tries to optimize the cost of handling the next component
a message is destined to. While this algorithm is simple
and light-weight, it may lead to sub-optimal decisions
since it does not examine more than one component in
the future. However, we find during our experiments,
that for our specific speech recognition application, even
this simple algorithm provides good results.
4. EXPERIMENTALEVALUATION
In this section we evaluate the effectiveness of the
proposed system through a series of controlled experi-
ments.
We ported the CMU Sphinx4 speech recognition li-
brary [7] for use with our remote execution framework,
componentizing its internal functionality as shown in
Figure 1. The component diagram represents a linear
pipeline. In any optimal partitioning, either the entire
pipeline will execute on the phone, or a contiguous chunk
of it will execute in the cloud. We built a simple An-
droid application that given an audio sample, uses our
Sphinx4 port to perform speech recognition, and return
in text form, the words in the audio sample.
We conducted experiments using an Android Nexus
One phone. We used a simple input size complexity
metric for prediction. We first trained our prediction
system using a corpus if 10 short audio samples. We
then ran the speech recognition application on the Nexus
One phone for 10 different audio samples, collecting de-
tailed logs that allowed us to determine for each exe-
cution cycle, the predictions made, the partitioning de-
cision made, and finally, the cost of execution of each
component (measured both in energy and time).
Figures 4, 5, and 6 evaluate the accuracy of the pre-
dictions for three different quantities: energy consump-
tion, execution time, and output size. Even though we
used a generic input size complexity metric, the predic-
0
5
10
15
20
25
30
0 5 10 15 20 25 30
Predicted Energy Consumption (J)
Actual Energy Consumption (J)
Figure 4: Predicted and Actual Energy Consump-
tionScatterPlot
0
5
10
15
20
25
0 5 10 15 20 25
Predicted Run Times (s)
Actual Run Times (s)
Figure 5: Predicted and Actual Run Time Scatter
Plot
tions do show a rather strong correlation (0.74, 0.73,
and 0.99 respectively for energy consumption, execution
time and output size), despite some noise due to system
dynamics. Indeed, for the output size predictions (the
only truly deterministic of the three predicted metrics),
the prediction is near perfect.
Finally, we conducted more experiments to evalu-
ate the optimality of the remote execution system as a
whole. We ran the speech recognition application on
the phone using both the WiFi and the 3G radios, each
of which has a different energy profile, and speed. We
measured the average cost of performing a full execu-
tion cycle and compared it for the following partitioning
strategies.
All-Phone: All of the components were executed
on the phone.
0
0.5
1
1.5
2
0 0.5 1 1.5 2
Predicted Output Size (MBytes)
Actual Output Size (MBytes)
Figure 6: Predicted and Actual Output Size Scatter
Plot
0
5
10
15
20
25
30
WiFi 3G
Energy (J)
All-Phone
Expert
Optimal
Greedy
Figure7: TotalEnergyCostsforDifferentPartition-
ingStrategies
Expert: The developer made an educated guess as
to which static partitioning would be the best. All
audio processing and feature extraction was per-
formed on the phone, leaving only the final Rec-
ognizer component to be executed remotely.
Optimal: A post facto offline analysis was per-
formed to determine which would have been the
optimal partitioning, emulating an omniscient par-
titioner.
Greedy: The previously described greedy parti-
tioner was used.
The data shows that, even using a partitioning algo-
rithm as simple as the greedy approach, significant sav-
ings can be achieved as compared to the base-line All-
Phone scenario (95% and 21%, respectively for the WiFi
and 3G scenarios). Indeed, for the WiFi scenario, the
greedy strategy managed to find the optimal partition-
ing while, when using the 3G radio, the solution found
was nearly identical to the expert partitioning. This sub-
optimality is a consequence of the shortsightedness of
the greedy algorithm and the high initial cost of using
the 3G radio. Under the circumstances, the algorithm
continually chooses local execution in an effort to avoid
using the radio entirely. These decisions backfire as ul-
timately, the system recognizes that the finalRecognizer
component is too expensive and decides to offload.
5. RELATEDWORK
Offloading computation from smartphones and mo-
bile devices in order to reduce their energy footprint, as
well as improve performance, has been explored along
various dimensions. Researchers have considered parti-
tioning the code executing on the device, and offloading
parts to a remote server in an effort to conserve energy
at the mobile device, as well as to improve performance.
Kremeretal. [3] propose a remote execution frame-
work for mobile devices, where the partitioning deci-
sions are made statically, and the code can be partitioned
into at most two parts, i.e., the computation cannot re-
turn to the mobile device. SpatialViews [5] is a program-
ming framework designed for mobile adhoc networks
that requires the users to manually specify static parti-
tions as part of the program. Both of these systems are
geared towards specifically reducing energy consump-
tion, rather than improving performance. Unlike, these
works, we believe that the decision of where to partition
should be made dynamically, since it depends not just
on the network conditions, but also on the complexity
of computation which may vary with different inputs.
Many works dynamically decide how to partition and
offload computation. MAUI [2] and Wishbone [4] are
two systems which profile the energy usage and com-
putation costs of various components of a program in
order to decide at runtime how to partition the compu-
tation. MAUI is an execution framework designed for
smartphones, while Wishbone is a framework to build
sensor network applications. In Chroma [1], users pro-
vide a set of “tactics”, each of which is a different way to
implement a mobile application. The runtime chooses
among these depending on current conditions, and de-
cides whether to offload each function in the chosen tac-
tic, using runtime profiling of resource usage for that
function. None of these works consider the effect of in-
put complexity on the cost of a component’s execution.
The authors of Odessa [6], a framework for building in-
teractive perception application for smartphones, recog-
nize that the complexity of computation depends on the
inputs. They leverage this in the context of their spe-
cific class of applications, along with runtime profiling
to make offloading decisions dynamically. We propose
going a step further and predicting the complexity of an
algorithm and its resource usage to make partitioning
decisions without the overhead of runtime profiling.
6. CONCLUSION
In this paper we made the case for an automatic par-
titioning system for cloud-enabled smartphone applica-
tions that uses input complexity metrics to estimate re-
source usage. Current state-of-the-art partitioning algo-
rithms use simplistic approaches to estimate a compo-
nent’s resource usage, such as using the average value of
previous executions. However, a component’s actual re-
source usage also depends on the input, unaccounted for
in these simple models. By estimating resource usage
using generic and application-specific complexity mea-
sures on each component’s input, in conjunction with
automated learning procedures to assimilate each algo-
rithm’s and each platform’s inherent complexity costs,
more accurate predictions can be achieved, allowing the
partitioner to operate in a more educated fashion.
We showed experimentally that using simple heuris-
tics based on input size, resource usage can be predicted
with a correlation of 0.73 or more. Furthermore, using
this information, a simple greedy partitioning algorithm
was able optimize a speech recognition library’s execu-
tion, saving 21% or more on power consumption.
While the initial results shown here are promising,
more work is required to further improve the system.
Better prediction accuracy can be achieved by using more
application-specific heuristics. These will naturally be
developed as we port more applications to use our sys-
tem. As for the partitioning algorithm, while the greedy
approach illustrated here did perform quite well, other
algorithms (based on dynamic programming and linear
programming, for example) may be able to achieve bet-
ter results. These efforts, as well as other usability en-
hancements to the framework, are left for future work.
7. REFERENCES
[1] R. K. Balan, M. Satyanarayanan, S. Y . Park, and
T. Okoshi. Tactics-based remote execution for mobile
computing. InProceedingsofthe1stinternational
conferenceonMobilesystems,applicationsandservices,
MobiSys ’03, pages 273–286, New York, NY , USA,
2003. ACM.
[2] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman,
S. Saroiu, R. Chandra, and P. Bahl. Maui: making
smartphones last longer with code offload. In
Proceedingsofthe8thinternationalconferenceon
Mobilesystems,applications,andservices, MobiSys ’10,
pages 49–62, New York, NY , USA, 2010. ACM.
[3] U. Kremer, J. Hicks, and J. M. Rehg. A compilation
framework for power and energy management on mobile
computers. InInInternationalWorkshoponLanguages
andCompilersforParallelComputing(LCPC’01), 2001.
[4] R. Newton, S. Toledo, L. Girod, H. Balakrishnan, and
S. Madden. Wishbone: profile-based partitioning for
sensornet applications. InProceedingsofthe6thUSENIX
symposiumonNetworkedsystemsdesignand
implementation, pages 395–408, Berkeley, CA, USA,
2009. USENIX Association.
[5] Y . Ni, U. Kremer, A. Stere, and L. Iftode. Programming
ad-hoc networks of mobile and resource-constrained
devices. InProceedingsofthe2005ACMSIGPLAN
conferenceonProgramminglanguagedesignand
implementation, PLDI ’05, pages 249–260, New York,
NY , USA, 2005. ACM.
[6] M.-R. Ra, A. Sheth, L. Mummert, P. Pillai, D. Wetherall,
and R. Govindan. Odessa: Enabling interactive
perception applications on mobile devices. In
Proceedingsofthe9thInternationalConferenceon
MobileSystems,Applications,andServices
(MobiSys’11), June 2011.
[7] W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh,
E. Gouvea, P. Wolf, and J. Woelfel. Sphinx-4: a flexible
open source framework for speech recognition. Technical
report, Mountain View, CA, USA, 2004.
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 930 (2012)
PDF
USC Computer Science Technical Reports, no. 961 (2015)
PDF
USC Computer Science Technical Reports, no. 923 (2012)
PDF
USC Computer Science Technical Reports, no. 921 (2011)
PDF
USC Computer Science Technical Reports, no. 872 (2005)
PDF
USC Computer Science Technical Reports, no. 938 (2013)
PDF
USC Computer Science Technical Reports, no. 931 (2012)
PDF
USC Computer Science Technical Reports, no. 677 (1998)
PDF
USC Computer Science Technical Reports, no. 848 (2005)
PDF
USC Computer Science Technical Reports, no. 822 (2004)
PDF
USC Computer Science Technical Reports, no. 631 (1996)
PDF
USC Computer Science Technical Reports, no. 774 (2002)
PDF
USC Computer Science Technical Reports, no. 852 (2005)
PDF
USC Computer Science Technical Reports, no. 777 (2002)
PDF
USC Computer Science Technical Reports, no. 642 (1996)
PDF
USC Computer Science Technical Reports, no. 773 (2002)
PDF
USC Computer Science Technical Reports, no. 825 (2004)
PDF
USC Computer Science Technical Reports, no. 782 (2003)
PDF
USC Computer Science Technical Reports, no. 941 (2014)
PDF
USC Computer Science Technical Reports, no. 937 (2013)
Description
Luis D. Pedrosa, Nupur Kothari, Ramesh Govindan, Jeff Vaughan, and Todd Millstein. "The case for complexity prediction in automatic partitioning of cloud-enabled mobile applications." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 927 (2012).
Asset Metadata
Creator
Govindan, Ramesh
(author),
Kothari, Nupur
(author),
Millstein, Todd
(author),
Pedrosa, Luis D.
(author),
Vaughan, Jeff
(author)
Core Title
USC Computer Science Technical Reports, no. 927 (2012)
Alternative Title
The case for complexity prediction in automatic partitioning of cloud-enabled mobile applications (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
5 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269753
Identifier
12-927 The Case for Complexity Prediction in Automatic Partitioning of Cloud-enabled Mobile Applications (filename)
Legacy Identifier
usc-cstr-12-927
Format
5 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/