Broadcast Protocols to Support E cient Retrieval from Databases by

Transkript

Broadcast Protocols to Support Ecient Retrieval
from Databases by Mobile Users
Anindya Datta
The University of Arizona
and
Debra E. VanderMeer
Tandem Computers
and
Aslihan Celik
The University of Arizona
and
Vijay Kumar
University of Missouri-Kansas City
Mobile computing has the potential for managing information globally. Data management issues
in mobile computing have received some attention in recent times, and the design of adaptive
broadcast protocols has been posed as an important problem. Such protocols are employed by
database servers to decide on the content of broadcasts dynamically, in response to client mobility
and demand patterns. In this paper we design such protocols and also propose ecient retrieval
strategies that may be employed by clients to download information from broadcasts. The goal is
to design cooperative strategies between server and client to provide access to information in such
a way as to minimize energy expenditure by clients. We evaluate the performance of our protocols
both analytically and through simulation.
Categories and Subject Descriptors: H.2.4 [Database Management]: Systems; H.2.8 [Database
Management]: Applications; C.2.1 [Computer-Communication Networks]: Network ArName: Anindya Datta
Address: MIS Dept., University of Arizona, Tucson, AZ 85721
Name: Debra E. VanderMeer
Address: Tandem Computers, 10501 N. Tantau Avenue, Cupertino, CA 95014
Name: Aslihan Celik
Address: MIS Dept., University of Arizona, Tucson, AZ 85721
Name: Vijay Kumar
Address: Dept. of Computer Science and Telecommunications, University of Missouri-Kansas
City, Kansas City, MO 64110
An early version of this paper previously appeared in Proc. Intl. Conf. on Data Engineering,
Birmingham, U.K., April 1997.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for prot or direct commercial
advantage and that copies show this notice on the rst page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must
be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on
servers, to redistribute to lists, or to use any component of this work in other works, requires prior
specic permission and/or a fee. Permissions may be requested from Publications Dept, ACM
Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212) 869-0481, or [email protected].
2
A. Datta, D. VanderMeer, A. Celik, V. Kumar
chitecture and Design|Wireless communication; H.4.3 [Information Systems Applications]:
Communications Applications
General Terms: Algorithms, Performance
Additional Key Words and Phrases: Adaptive Broadcast Protocols, Mobile Databases, Energy
Conservation, Client-Server Computing
1. INTRODUCTION
One of the most important information dissemination strategies in the current
fast paced corporate and personal environments is to employ wireless networks.
Furthermore, the data-hungry users who access such networks are expected to be
mobile. In fact, a widely accepted information access model envisions millions of
highly mobile users carrying small, battery powered palmtop terminals equipped
with wireless connectivity. The types of information that may potentially be accessed are virtually boundless, e.g., nancial, weather, trac, sports and airline
schedules.
While the challenges of designing and implementing wireless networks are important ones for the telecommunications research community, data management
issues related to organizing and retrieving information from wireless channels have
posed challenges for the database community as well. Recent advances in wireless
technology have made data access possible for mobile users, who are geographically dispersed. However, there are a number of hardware and software problems
which must be resolved before the capabilities of mobile computing can be fully
utilized. Some of the software problems, such as data management, transaction
management, database recovery, etc., have their origins in distributed database
systems. In mobile computing, however, these problems become more dicult to
solve, mainly because of the narrow bandwidth of the wireless communication channels, the relatively short active life of the power supplies (battery) of mobile units,
and the changing locations of required information due to client mobility.
A widely accepted paradigm of disseminating information in a disconnected and
mobile environment, is through the use of broadcasts [Vaidya and Hameed 1996a;
Barbara and Imielinski 1994; Imilienski et al. 1994a; Imilienski et al. 1994b]. In
this model a \server" broadcasts information and \clients" tune in to the broadcast to retrieve their data of interest. The widespread acceptance of the broadcast
phenomenon is mainly due to the notion of broadcast asymmetry, where the \downstream" (server to client) communication capacity is relatively much greater than
the \upstream" (client to server) communication capacity. The two major issues of
interest in this scenario are (a) how and what does the server broadcast, and (b)
how does the client retrieve? Of particular interest are solutions that enable clients
to retrieve their data of interest eciently, as well as with a minimum of energy
expenditure. Of course, the two above questions are composed of several dierent
sub-issues, as explained soon in this paper.
In this paper we deal with the problem of data retrieval by mobile units from
wireless broadcasts. The goal is to design and evaluate adaptive broadcast protocols to support ecient information retrieval while simultaneously attempting
Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users
3
to minimize energy consumption by the mobile units. The paper is organized as
follows. We provide a brief review of existing work in the mobile computing area
in section 2 and a description of our mobile computing architecture in section 3.
We then go on to stipulate a concrete problem statement in section 4, discuss some
preliminary information in section 5 and describe our protocols in section 6. Subsequently, we evaluate the performance of our protocols analytically in section 7 and
analyze and discuss the performance in section 8. In order to overcome some of the
limitations of our analytical model, we further performed an empirical performance
study, through simulation. We describe the simulation model in section 9, and
discuss the results of our experiments in section 10. We conclude in section 11.
1.1 Contribution of this Paper
In recent times there has been some interesting work in investigating data issues
in mobile environments (see section 2 for extended discussion). In particular, very
interesting research in the area of broadcasts in mobile environments is reported
by Imilienski and Badrinath et al. in Rutgers (see e.g., [Imielinski et al. 1997; Barbara and Imielinski 1994; Imielinski and Badrinath 1992; Imilienski et al. 1994a;
Imilienski et al. 1994b]) and Vaidya et al. in Texas A & M (see e.g., [Vaidya and
Hameed 1996a; Vaidya and Hameed 1996b]). Most of this work is concerned with
access methods, i.e., how to organize the broadcast data. In particular Imilienski,
Badrinath et al. have done extensive treatment of indexing in broadcasts, and how
to organize data and index in broadcasts. Vaidya et al. report nice results on broadcast scheduling, i.e., when to broadcast particular items, such that response times
(termed \access times" in these papers) are minimized. In reporting this work, researchers have suggested various methods of indexing information, or have designed
algorithms to maintain cache consistency at clients. This work pre-supposes that
the composition of the broadcast is known. In other words, given the composition
of a broadcast, there has been lot of nice work in organizing data and index on
air, as well as scheduling broadcasts. Our concern is dierent. In [Imielinski et al.
1997], while discussing relevant future work, the authors mention that the most
likely mode of retrieval in mobile environments is going to be mixed, i.e., the most
frequently accessed items are broadcast and other items are provided on demand.
To support such a mixed technique, the authors remark, adaptive broadcast protocols need to be designed. Such protocols will dynamically alter the broadcast
content depending upon client demand patterns. In this paper we propose and
evaluate, both analytically and empirically, the performance of two such protocols.
To the best of our knowledge, there does not exist, in the published literature, other
examinations of the same topic.
Aside from the dynamic determination of broadcast content, which is concerned
with the server side, this paper also delves deeply into the client side. Specically,
we address the problem of ecient and power conservant retrieval by clients. To
this end, this paper proposes a number of retrieval protocols that a client might use
to download data from broadcasts. Thus, this paper oers an integrated treatment
of broadcast content determination and retrieval. This is achieved by designing
cooperative protocols between the server and client sides. This is also a novel aspect
of this work. We would like to note however, that we draw heavily upon existing
work. Actually, this work, together with work cited above, forms a comprehensive
4
theory of organizing, managing and retrieving from broadcasts in a wireless, mobile
environment.
2. RELATED WORK
The primary reference area of this paper is data management in wireless networks
in general and in mobile computing in particular. There exists much work in
the area of mobile computing, particularly from a telecommunications perspective. This work is somewhat orthogonal for our purposes. Therefore, rather than
cite these papers, we prefer to refer the reader to the following excellent web site:
http://maui.ee.tu-berlin.de/bibl/. This site, at last count, contained over 200 papers on mobile networking and is an excellent source of background material. Our
primary interest however, is the area of data management in the context of mobile
computing. This area has received considerable research attention in the recent
past. Nice discussions of several data management issues in mobile computing scenarios may be found in [Alonso and Korth 1993; Dunham and Helal 1995; Imielinski
and Badrinath 1994; Forman and Zahorjan 1994]. While the aforementioned papers are somewhat high level, there are also papers that deal in detail with specic
issues.
Architecture of database systems in mobile environments receives detailed treatment in [V. Anantharam and Wei 1994; Malyan et al. 1993]. In [V. Anantharam
and Wei 1994], the authors propose a dynamic programming strategy to optimize
the mapping of a database hierarchy to the network conguration. In [Malyan et al.
1993], a complete architecture for wireless communication is proposed, which includes a hierarchical database structure to implement the HLR/VLR users location
scheme.
Location Management is yet another important topic in this area, which is primarily concerned with designing strategies to keep track of mobile users. In [P. Krishna
and Pradhan 1994] a number of search strategies are proposed to locate users. Another nice paper in location management is [D. Lam and Wisdom 1995], which
proposes a framework for modeling and evaluating the performance of location
management schemes for a Personal Communication Services (PCS) network. In
[Shivakumar and Wisdom 1994], user prole replication is proposed as a mechanism for rapid location lookup. A recent paper [Satyanarayanan 1996] considers
the \span of access" under changing location conditions and oers a summary of
research in this area.
Data Replication is considered in [Badrinath and Imielinski 1992], which studies
cache consistency for replicated data. In [Wolfson and Jajodia 1992], dynamic data
replication algorithms are proposed for tree and star networks. Another interesting paper in this area is [Huang et al. 1994] which considers the problem of data
allocation in a large number of databases accessed through a wireless network, and
proposes strategies to optimize communication costs.
Data retrieval from broadcasts is the topic of this paper. A number of papers have
appeared on this subject, including [Barbara and Imielinski 1994; Imilienski et al.
1994a; Imilienski et al. 1994b; Imielinski et al. 1997; Vaidya and Hameed 1996a;
Vaidya and Hameed 1996b]. This set of papers forms the background of our work
reported in this document. The papers from Rutgers [Barbara and Imielinski 1994;
Imilienski et al. 1994a; Imilienski et al. 1994b; Imielinski et al. 1997] primarily deal
5
with data organization in a broadcast, e.g., index organization and structure and
cache invalidation strategies. We, on the other hand, are concerned with how to
determine optimal broadcast content and design \good' retrieval strategies for users.
Note that these two (i.e., their and our work) are complementary. Our strategies
and their strategies put together form a comprehensive broadcasting framework.
The papers by Vaidya et al. deal with, among other issues, the notion of broadcast
scheduling, i.e., when to broadcast particular items. The metric optimized is access
time, i.e., response times. Aside from a basic dierence in the broad objectives (we
are concerned with broadcast content determination and client retrieval strategies),
another point of departure of our work is that we are concerned not only with access
time, but also with \tuning time" (the duration of time clients actively listen).
Thus, both eciency as well as power conservation, are of interest to us.
Finally, a very relevant work from our perspective is the work on Broadcast Disks
(BD) performed by researchers at Brown, Maryland and Matsushita Labs. A BD
involves the determination of a broadcast program containing all the data items
to be disseminated and the subsequent transmission of this program in a periodic
manner. The repetitive nature of the broadcast allows the medium to be used as a
type of storage device, hence the name \broadcast disk". The novelty of this work
is that it proposes a \multi-level" structuring mechanism that allocates bandwidth
to data items based on the access importance of these items. BDs are motivated
by the notion of communication asymmetry introduced in [Zdonik et al. 1994]. In
[Acharya et al. 1995] the authors study cache management policies under the BD
paradigm, and extend this work to include prefetching in [Acharya et al. 1996b].
The impact of updates on the base data is studied in [Acharya et al. 1996a] and
the impact of a backchannel by which clients can \pull" data items is studied
in [Acharya et al. 1997]. While BDs are related to the work presented in this
paper, there exist some fundamental dierences between them. First, BDs have
been proposed for (primarily push based) "information dissemination" frameworks
in general. While the mobile broadcast scenario considered here qualies as an
information dissemination framework, the issue of mobility introduces features that
the BD framework does not address[Badrinath 1997; Franklin 1997].
|Firstly, BDs are constructed based on known access probabilities { this is very
hard, if not impossible, to know in a mobile framework where clients move in
and out of cells at their own will. Our strategies consider stateless servers, i.e.,
servers have no a priori knowledge of client movement or access patterns.
|Secondly, for the abovementioned reason (i.e, clients moving in and out of cells),
the broadcast content needs to be dynamic { BDs provide a static broadcast content. Even in [Acharya et al. 1997], where the backchannel is studied, the authors
make it a point to say that the "broadcast is static" and dynamically changing
broadcast content is identied as an area of future study. We have created a
framework where the broadcast adapts to changing client access patterns.
|Thirdly, in the BD framework client demands are completely satised, i.e, all the
demanded items are included in the broadcast. In a mobile scenario however,
the broadcast period may be too small for the inclusion of all items (particularly
if some items are broadcast more often that others as in the BD framework) {
this means that some method would need to be constructed that includes certain
6
items in the broadcast in preference to others. The BD framework is unable to
handle this. We have created a sophisticated prioritization mechanism to handle
this issue.
|Finally, the BD framework exclusively considers latency as the primary performance metric { no attention is paid to energy conservation by the clients. We,
on the other hand, simultaneously pay attention to both.
In summary, while the BD work is interesting and relevant, there are signicant
limitations of this framework in order to apply it to mobile computing frameworks.
However, a very interesting line of study would be to adapt the BD framework to
make it suitable for use by mobile clients and servers, by addressing some of issues
identied above.
3. ARCHITECTURE ISSUES
In this section we give a brief overview of the architectural assumptions made in
this paper, both with regard to the underlying networked mobile environment as
well as to the database.
3.1 Mobile Environment
We have used a general architecture of a mobile platform, which is given in Figure
1 [Dunham and Helal 1995]. It is a distributed architecture where a number of
computers, xed hosts and base stations, are interconnected through a high speed
wired network. Fixed hosts are general purpose computers which are not equipped
to manage mobile units, but can be congured to do so. Base stations are equipped
with wireless interfaces and communicate with mobile units to support data access.
Wireless Radio Cells
Wireless Radio Cell
crossing
disconnected
Base
Station
Fixed
Host
Fixed
Host
Fixed
Host
Fixed
Host
High Speed Wired Network
Base
Station
Fixed
Host
Base
Station
Base
Station
Fixed
Host
Mobile
Units
Wireless Radio Cell
Wireless LAN Cell
Fig. 1. A General Architecture of a Mobile Platform
Wireless
Links
7
Mobile units are battery-powered portable computers, which move around freely
in a restricted area, referred to here as a geographic mobility domain. The size
restriction on a unit's mobility is mainly due to the limited bandwidth of wireless
communication channels. To manage the mobility of units the entire geographic
mobility domain is divided into smaller domains called cells. The mobile discipline
requires that the movement of mobile units be unrestricted within the geographic
mobility domain (inter-cell movement). unit is in the middle of retrieving records
of a le and if it has to
The mobile computing platform can be eectively described under the client/server
paradigm. Thus sometimes we refer to a mobile unit as a client and sometimes as
a user. The base stations are identied as servers. Each cell is managed by a base
station, which contains transmitters and receivers for responding to the information
processing needs of clients located in the cell. We assume that the size of a cell is
such that the average query response time is much smaller than the time required
by the client to traverse it. Therefore, it will seldom occur that a user submits a
query and exits the cell before receiving the response.
Clients and servers communicate through wireless channels. The communication
link between a client and a server may be modeled as multiple data channels, or
a single channel [Imielinski et al. 1997]. We assume a single channel for reasons
outlined in [Imielinski et al. 1997]. We further assume that this channel consists
of both an uplink for moving data from client to server and a downlink for moving
data from server to client.
3.2 Database Architecture and its Characteristics
Data replication and partitioning are broad areas of research themselves; they are
not the concern of this paper. Here, we have assumed full database replication. The
data is characterized as rapidly changing [Datta 1994]; users often query servers to
remain up-to-date. More specically, they will often want to track every broadcast
for their data item of interest. Typical examples of this type of data are stock,
weather, and airline information. We assume the following for fully characterizing
our mobile database.
1. The database is updated asynchronously, i.e., by an independent external process, which does not aect the protocols designed in this paper. Also, such updates
arrive with high frequency, signifying that the database is rapidly changing. Examples of such information are stock, weather, etc.
2. Users are highly mobile and randomly enter and exit from cells. There is a
parameter called Residence Latency (RL) which characterizes the average duration
of a user's stay in the cell. We provide a full explanation of RL in section 5.
3. User reference behavior is localized; e.g., some stocks are more popular than
others.
4. Servers are stateless, i.e., they maintain neither client arrival and departure patterns nor client-specic data request information [Imielinski et al. 1997]. We assume
a stateless server because we believe that the cost of maintaining a stateful server
in a mobile environment would be prohibitively expensive. We want to emphasize,
however, that our scheme will work with stateful servers as well.
8
4. PROBLEM STATEMENT
Wireless networks dier from wired networks in many ways. Database users over a
wired network remain connected not only to the network, but also to a continuous
power source. Thus, response time is the key performance metric. In a wireless
network, however, both the response time and the active life of the user's power
source (battery) are important. While a mobile unit is listening or transmitting on
the line, it is considered to be in active mode. Assuming a power source of 10 AA
batteries and a laptop equipped with CD-ROM and display, estimated battery life
in active mode is approximately 2.7 hours [Imielinski et al. 1997].
In order to conserve energy and extend battery life, a clients slips into doze
(stand by) mode, in which she is not actively listening on the channel. Clients
expend signicantly less energy in doze mode than in active mode. Therefore, one
of the major goals of our scheme is to minimize the amount of time a client must
spend in active mode to retrieve the data items she requests.
The problem addressed in this paper may be captured by the following question:
given that users are highly mobile in their mobility domain, what are good strategies that the server can use to decide on what to broadcast? The assumption is
that such strategies need to adapt to user demand patterns in the highly mobile
environment. We are also interested in the question of retrieval strategies: given
that good broadcast strategies are found, what are good retrieval algorithms by which
users can retrieve/download data from broadcast, with a minimum of energy expenditure? The basic idea, inspired by suggestions in [Imielinski et al. 1997], is one of
\mixed broadcasting", i.e., automatic, as well as on-demand broadcasting.
We follow a similar approach to that presented in [Imielinski et al. 1997] for characterizing the performance of our algorithms. We dene the following terms:
Access Time (AT): Access time refers to the time elapsed between query submission and receipt of the response.
Tuning Time (TT): Tuning time is the duration of time that the client spends
actively listening on the channel.
The meaning of these terms is illustrated in gure 2. Consider a client who submits
a request at time T0 and receives the response at time T7 .
T0
T1
T2
T3
T4 T5
T6
T7
Time
Access Time
Tuning Time
Fig. 2. Access and Tuning Times
In this scenario, if the client listens continuously from the time a query is submitted until the response is received, then AT = TT = (T7 , T0 ). On the other
hand, if the client slips into doze mode intermittently, then TT is noticeably less
than AT, signicantly reducing battery usage. In this case AT = (T7 , T0), and
9
TT = (T7 , T6 ) + (T5 , T4) + (T3 , T2 ) + (T1 , T0 ). This results in energy conservation, as the client is in active mode for only short periods of time. The question,
of course, is how to determine the smallest possible tuning intervals. An ideal
approach appears to be providing the client with precise knowledge of when her
requested information will be broadcast.
Our aim is to nd optimal points in the two dimensional space of AT and TT.
This becomes dicult because there appears to be a trade-o between AT and
TT; attempts to reduce one tend to increase the other. For example, access time
is directly related to the size of the broadcast, i.e., AT is smaller for a smaller
broadcast size. On the other hand, providing information for selective auto-tuning,
i.e., informing the user precisely where its required data is located in the broadcast,
reduces tuning time. However, inclusion of such tuning information would increase
the overall size of the broadcast by including overhead, which in turn could increase
AT. Conversely, eliminating this overhead will reduce AT at the expense of an
increased TT, because the user will not know precisely when to tune in.
5. PRELIMINARIES
Before delving into details of our proposed strategies, we rst describe certain notions that we will use in developing those strategies.
Broadcast Set and Content: Broadcast set refers to the set of data items included in a broadcast. Broadcast content refers to the composition of a broadcast,
i.e., the data tuples along with access and overhead information included in a
broadcast. Note that while the broadcast set of two successive broadcasts may be
identical, the content may change due to changes in the actual data values that
may have occurred in the meantime. For example, while IBM stock prices at various markets may be broadcast repeatedly, the actual data tuples corresponding to
the IBM prices may change. A major goal of this paper is the devise strategies to
determine what the broadcast set should be, given the high degree of mobility as
well as particular demand patterns of clients. In particular the server must make
two decisions: (a) identify items to include in broadcast, and (b) determine the
temporal duration that a particular data item needs to be broadcast, once it is
included.
Broadcast Periodicity: The periodicity of broadcast is important as it aects
both the access and tuning times. We consider both periodic and aperiodic broadcasts in our strategies.
Data Organization: Data organization refers to how the broadcast content is
organized, e.g., how data and indices are interleaved. In [Imielinski et al. 1997],
data organization has been addressed very nicely and is not a research issue in our
work. We make use of the results presented in [Imielinski et al. 1997]. In particular,
for our strategies, we assume a (1,m) indexing strategy as presented in [Imielinski et al. 1997]. This strategy posits that a complete index for the broadcast is
repeated every ( m1 )th of the broadcast. In other words, the entire index occurs m
times during a broadcast, where m is a parameter.
Residence Latency (RL) and Estimated Departure Time (EDT): When
the server decides to include an item in its broadcast, it also needs to decide the
10
length of time this item will remain in its broadcast set. In order to identify the residency duration of a data item in a broadcast, we propose to associate a Residence
Latency (RL) value with each cell. RL identies the duration an item would remain
in a broadcast for that cell. The RL value for a specic cell is the average length of
time a mobile user resides there. RL for a cell could be computed a priori based on
the advance knowledge of user movement patterns and cell geography. We believe
this is feasible based on information currently available. A data item's Expected
Departure Time (EDT) from a broadcast is computed by adding the item's entry
time into the broadcast and the cell's RL.
Popularity Factor: The popularity factor of item X at time T , denoted by PFTX ,
identies the number of clients in the cell at time T who are interested in X. Henceforth we shall drop the use of the time superscript in the PF expression without
any loss of generality. The PF of item X is maintained in the system as follows.
When a client requests X, PFX is increased by 1. However, every time PFX is
incremented, the system records the corresponding time. Let the timestamp of the
ith increment to PFX be denoted by TXi . Then, a corresponding decrement of 1 is
performed on the value of PFX at time (TXi + RL). This reects the (anticipated)
departure of the client whose request caused the ith increment. We now describe
how PF for dierent data items can be computed eciently.
We begin by reminding the reader that a popularity factor (PF) is maintained
for each data item. Let the notation PFX denote the popularity factor of data item
X. For each new request for a data item arriving at time t, its PF is incremented
by 1 and subsequently decremented by 1 at time t + RL. Given that there may
be a large number of data items in the database, and large client loads, it may
appear that maintaining PF is an expensive task. Of particular concern is the
scalability of the PF maintenance process with increases in database size. However,
as we show below, PF can be maintained extremely eciently by the use of simple
data structures. In particular, we show that the time eciency of this process is
independent of the database size. We assume the existence of a global clock, which
is the basis of synchronization of all time dependent occurrences in the system.
We maintain a queue, termed the PF Decrement Queue (PDQ) where the Decrement Instructions (DI) are enqueued. A DI is generated every time the PF of
any item is increased and stores the time at which the popularity factor must be
decremented. It may be thought of as a 2-tuple <data item, timestamp>. Thus,
a DI <X,ti > indicates, that at time ti , PFX should be decremented by 1. Upon
the generation of a DI, it is immediately enqueued in the PDQ. The PF Maintenance Process (PFMP) monitors the head of this queue. At each time advancement
epoch (granularity to be decided by the system designer), the PFMP examines the
DI at the head of the PDQ and compares its timestamp to the current time. If the
timestamp is greater than or equal to the current time, the DI is executed, i.e., the
PF of the data item referenced by the DI is adjusted. Note that this adjustment
is a basic arithmetic operation, and is thus extremely fast. Moreover, since there
is only one PF value per data item these can be easily maintained in some sort of
directly addressable data structure in memory. We do not anticipate the size of
this data structure to be overly large, e.g., if a broadcast were to disseminate stock
information of every publicly traded stock on earth, it would need to keep track of
11
around 8000 data items (2900 for stocks traded on the NYSE [Inc. 1997]). Clearly,
therefore the complexity of the PFMP process is quite low { of the order of a few
instructions per unit time (one comparison and one (possible) addition and one
dequeue). This is well within the capabilities of even current desktop machines, let
alone powerful server hardware that would very likely be used as cell servers. Note
that the complexity of this process is independent of the size of the database. Also,
the rapidity of the PFMP process does not allow the PDQ to grow too large, which
is an added benet.
Dynamism of the Underlying Database: It has been widely suggested (e.g.,
see [Imielinski et al. 1997]) that certain information that people are likely to access
in a mobile fashion would be dynamic, e.g., stock information, weather and trac.
This rapidly changing nature makes these databases more dicult to manage. In
this paper we are interested in these types of systems.
The rapidly changing nature of the database has some eects on the broadcasting
strategy. In particular, in this scenario, there is a reasonable likelihood that the
database state will change from one broadcast to another. Such changes often
occur in bursts, e.g., a large fraction of stock trades occur in the last hour of
trading. At these times, clients are particularly interested in retrieving current
values for their data items of interest. Moreover, at these times, clients would
often download data items from every broadcast to keep themselves apprised of the
rapidly changing current state. It is easily seen that this can be quite expensive,
as the client needs to tune into every broadcast. We are particularly interested in
minimizing a client's cost (i.e., energy expenditure) in such scenarios. Our basic
strategy may be explained through the following example. Assume that a client is
interested in a particular data item, corresponding to which there exist B buckets
in a broadcast. Assume further that the client would like to download the same
information in the following broadcast. Now, if only Bdirty buckets have changed,
where Bdirty < B , it is then advantageous for the client to simply download the dirty
(i.e., modied since the previous broadcast) buckets from the subsequent broadcast,
rather than the full set of B buckets. One can easily see that such savings quickly
accumulate over a few successive broadcasts. In order to execute this strategy,
we shall maintain certain special metadata in buckets which will allow clients to
selectively tune in to dirty buckets. That, and our entire broadcast structure, is
explained below.
5.1 Broadcast Structure
Broadcast structure refers to the specic broadcast organization that we assume in
developing our strategies. It is important to understand this structure in order to
properly appreciate our protocols. As mentioned before, we assume a (1,m) indexing strategy outlined in [Imielinski et al. 1997]. In this scheme, index information is
provided at regular intervals in the broadcast. More specically, a complete index
is inserted m times in a broadcast at regular intervals.
Figure 3 illustrates our broadcast structure. A broadcast is a sequence of data
blocks (containing data) and index segments (containing access information) as
shown in gure 3A. Using the (1,m) data organization methodology, an index
segment appears every ( m1 )th of the broadcast, i.e., there are m index segments.
12
Clearly, each of the m data blocks is also of equal size. Each data block is composed
of one or more data clusters as shown in gure 3B, where each data cluster consists
of a collection of tuples of a single data item of interest. For example, assume
the broadcast consists of stock information, and each tuple is a triple <stock id,
price, market>. In such a scenario, the data items of interest would be represented
by stock ids. Consider a particular stock id, e.g., IBM. All IBM records would
comprise a data cluster. A data cluster may span more than one data block.
Data clusters are composed of data buckets (gure 3C) which contain data records
as well as some tuning information (denoted by the 5-tuple < X; Y; Z; N; EB > in
the gure) explained below. We assume that each client has her own item of
interest (e.g., clients are not interested in all stocks, but instead in specic ones).
For the purposes of this study, we assume a client has a single data item of interest.
As explained above, all records pertaining to this item appear in a specic data
cluster which we refer to as the client's Data Cluster of Interest (DCI). Within the
broadcast, the data clusters are organized in order of decreasing popularity based on
PF, such that the most popular item will be broadcast rst, and the least popular
item will be broadcast last. This helps to reduce the access times for popular items.
An index segment is a series of buckets containing index tuples and some other
special tuning information. We rst describe the index tuples. Each index tuple
consists of a 4-tuple, < K; B; C; EC >, that not only informs the client precisely
where the DCI appears in the broadcast, but also provides information about updates to the cluster since the previous broadcast. The structure of the index segment
is shown in gure 3D. K, B, C and EC are dened below.
K : the cluster's key value (e.g., for an IBM cluster, the key value is IBM).
B : the id of the rst bucket of the data cluster.
C : the oset from B to the rst dirty bucket (bucket where changes have occurred
since the last broadcast) of the data cluster. If all buckets in the data cluster are
clean, C takes a default value of -1.
EC : the EDT of the cluster key, i.e., when the cluster is scheduled to be dropped
from the broadcast.
The dirty/clean information (i.e., B and C) are included to handle the rapidly
changing data scenario explained earlier in this section.. We assume a B-tree structure for the index. Thus, clients must begin reading the index at the root in order
to nd the pointers to their DCIs.
As mentioned above and shown in gures 3C and D, all buckets, whether index
or data, have a special tuple displayed as a 5-tuple < X; Y; Z; N; EB >. This
information is provided to orient clients as they initially tune in to a broadcast.
The X, Y, Z, N and EB terms are dened as follows.
X : An oset to the rst bucket of the next nearest index segment.
Y : An oset to the end of the broadcast, i.e., the start of the next broadcast.
Z : Shows the bucket's type (data or index) and contains tuning information for
items updated since the previous broadcast. It can hold one of four possible values:
Index
Segment
Data
Block
Index
Segment
Data
Block
Index
Segment
13
Data
Block
Overview of Broadcast Structure
Sequence of Index Segments and Data Blocks
[A]
Data
Cluster
Data
Cluster
Data
Cluster
Data
Cluster
. . .
Data
Bucket
. . .
X,Y,Z,N,EB
Data
Bucket
X,Y,Z,N,EB
Data
Bucket
X,Y,Z,N,EB
X,Y,Z,N,EB
Data Block
Sequence of Data Clusters
[B]
Data
Bucket
. . .
. . .
Index
Bucket
. . .
K j Bj Cj EC j
Index
Bucket
X,Y,Z,N,EB
X,Y,Z,N,EB
Index
Bucket
. . .
K i B i C i EC i
K 2 B2 C2 EC2
K 1 B1 C1 EC1
X,Y,Z,N,EB
Data Cluster
Sequence of Buckets Containing Data
Tuples with the Same Key Value
[C]
Index Segment
Sequence of Buckets Containing Index Data
[D]
Fig. 3. Broadcast Structure
Z = -2 indicates an index bucket.
Z = 0 indicates a data bucket, and that the bucket is clean; i.e., unmodied since
the previous broadcast.
Z = i, where i is a positive integer, indicates a data bucket, and that the bucket
is dirty; i.e., modied since the previous broadcast. Moreover, the actual i value,
i.e., the positive integer, is an oset to the the next dirty bucket in the same data
14
cluster.
Z = -1 indicates a data bucket, and that it is the last dirty bucket of the data
cluster.
N = 0 indicates a data bucket, and that is the last data bucket of the data cluster.
N = i indicates that this is not the last bucket of a DCI, and the oset to next
data bucket of the same DCI is i.
EB : The EDT of the data item in the bucket. Obviously, the EB value of every
bucket in the same data cluster is going to be identical and equal to the EDT of
the of the cluster key (e.g., in an IBM cluster, all EB values will be equal to the
EDT of the IBM data item).
While the reasons for including X , Y and EB are obvious, the utility of N and Z is
not. The N values in each bucket are used as a means to identify the last bucket of a
data cluster. In the indexing scheme used, there is a possibility that an index block
is inserted within a set of data buckets of the same data item, i.e., the data cluster
is fragmented across two or more data blocks. The N value gives the oset to the
next data bucket of the same DCI, thus making it possible to listen to successive
data buckets regardless of whether or not an index exists between them. The Z
values are used to facilitate ecient retrieval in rapidly changing data scenarios
explained earlier in this section. This will be illustrated in section 6.2.
We x the unit of retrieval as a bucket, following the lead of [Imielinski et al.
1997], where a bucket is dened as the smallest logical unit of broadcast. All
buckets (regardless of data or index) are of equal size. This allows for a direct
mapping between time and buckets. In fact, time is measured in terms of buckets
in this paper. This means that all the osets referred to in the preceding description
are in terms of buckets.
Finally we add a note regarding the notion of data clusters. A data cluster
consists of a sequence of buckets containing data tuples pertaining to a specic
key, such as a specic stock id. Moreover, we assume that selectivity, i.e., the
number of buckets required to broadcast a specic data item, is xed. This is a
reasonable assumption. Consider, for example, stock information expressed as a 3tuple <stock id, market, price>. Clearly for a particular key value (say IBM), the
number of tuples is equal to the number of markets (e.g., New York Stock Exchange)
where the stock is traded. This number is fairly constant, making the number of
buckets for a particular cluster fairly stable. We also assume that specic buckets
within a cluster have identical semantic content from one broadcast to another.
For instance, the rst bucket within the IBM cluster will always contain the tuples
related to the IBM prices in New York and Frankfurt (assuming there are 2 tuples
per bucket). Thus while the individual prices may change, the semantic content of
each bucket within a cluster is invariant.
6. SERVER AND CLIENT PROTOCOLS TO SUPPORT ADAPTIVE BROADCAST
CONTENT AND EFFICIENT RETRIEVAL
This section proposes adaptive broadcast protocols which seek an optimal balance
of access time (quality of service or average query response time) and tuning time
(energy consumption). We aim to develop cooperative strategies between clients
15
and servers for achieving such a balance.
As mentioned earlier, periodicity is an important parameter in designing broadcast strategies. A periodic broadcast signies that the broadcast \size" (i.e., number
of buckets) is xed. One can ascribe both advantages (e.g., predictability) as well
as disadvantages (e.g., loss of exibility) to periodicity. To study such eects, we
describe two sets of protocols below for the periodic and the aperiodic cases. We
refer to the periodic protocol as the constant broadcast size (CBS) strategy, whereas
the aperiodic broadcast protocol is termed a variable broadcast size (VBS) strategy.
Finally, note that we support a mixed mode (this term is borrowed from [Imielinski et al. 1997]) retrieval policy, i.e., when a client arrives in a cell, she rst tunes in
to the broadcast to see if her DCI is already there. If not, the client explicitly sends
a request to the server through the uplink for that item. Thus items may be found
readily in the broadcast, or may have to be placed \on demand". This policy has
been deemed the most \general" policy in the literature.
6.1 Constant Broadcast Size Strategy
We rst present the server protocol, i.e., the strategy used by the server in deciding
upon the broadcast content. We then present the client protocol, i.e., how the client
retrieves data from the broadcast.
6.1.1 CBS Server Protocol. In this strategy, broadcast size is limited, and the
broadcast is periodic. Periodicity mandates an equal size for every broadcast (recall that we consider both size and time in terms of buckets). If there are too few
requested items to ll the broadcast period, the broadcast will contain dead air.
On the other hand, if there are more requested items than space in the broadcast,
the server must prioritize requested items to decide which to include in the broadcast set. We propose that this prioritization mechanism simultaneously satisfy two
properties: popularity consciousness and avoidance of chronic starvation. Popularity consciousness means that items that are requested more often should have a
greater chance of being included in the broadcast than less popular items. Avoidance of chronic starvation means that if a client requests a \less popular" item, he
should not be chronically starved, i.e., the item should appear in the broadcast at
some point during that client's residence in the cell. At a minimum, our protocol
attempts (but does not guarantee) to provide access to a requested data item at
least once during a client's probable stay in the cell; that is, within RL time of the
request.
We propose a system of priority ranking of items based on two factors: a Popularity Factor (PF) and an Ignore Factor (IF). PF is computed as described in
section 5. We now describe the IF feature of our model.
Ignore Factor. Consider a strategy where the inclusion priority of items is based
purely on popularity, i.e., items are included in the broadcast based on a HighestPF-First (HPF) strategy. Under such a strategy, assuming a large number of clients
in a cell and a data set in which a small subset of items are very popular, clients
requesting less popular items may be chronically starved. That is, items with a
very low PF may be overwhelmed by the large number of requests for very popular
items, and will not be broadcast while the requesting client is still in the cell. Recall
that in our operating scenario, clients are highly mobile; thus, if a request is ignored
16
for a long time, it is likely that the client will leave the cell without receiving service
for that item throughout its entire residence in that cell. From a Quality Of Service
standpoint, this is highly undesirable. At the very minimum, our goal is to attempt
to service a client's request at least once during his stay in the cell in which the
request was made. The purpose of Ignore Factor (IF) is to ensure that less-popular
but long-neglected items get an opportunity to be included in the broadcast.
We rst explain the notion of IF. Assume a data item, say X, was last reported
in the broadcast ending at time T0 , after which it was dropped. At this point, the
Ignore Factor of X , denoted by IFX , is dened to be 1, the base value for any
item's Ignore Factor (the reason for choosing a minimum IF of 1, and not 0, will be
explained later). Further assume that the rst request for X after time T0 arrived at
time Treq . Subsequent to this point, whenever X is not reported in a broadcast, we
say that X has been ignored. More specically, the ignore factor of X at any time
Ti (assuming X has not been reported until Ti ), Ti > Treq , denoted by IFTXi , is given
by: IFXTi = NumBroadcast(Ti ; Treq ) + 1, where NumBroadcast(Ti ; Treq ) denotes the
number of broadcasts between Treq and Ti . Since we are considering a periodic
broadcast scenario, this number is easy to compute.
In particular,
assuming a
k
j
broadcast period of PB , NumBroadcast(Ti ; Treq ) = Ti ,PTBreq . In other words,
IFXTi = Ti ,P Treq + 1
B
(1)
When talking about the IF of a particular data item from hereon, we will often drop
the temporal superscript, i.e., Ti , unless the context warrants its explicit inclusion.
Note that equation 1 is only valid under the constraint (Ti , Treq ) RL. To
see this, consider a client C , who requested data item X at time Treq . Assume
no other requests arrive for this data item. Then, at any time Ti following Treq ,
IFX is given by equation 1 above. However, following the denition of RL, client
C would be expected to depart at Tdep = Treq + RL. Thus after Tdep , there is no
point in reporting X any longer, as its requester may have left. Thus, after Tdep ,
unless other requests for X have arrived in the meantime, IFX should be reduced
to 1 (the base value for any item's IF is 1; this is explained below). A corollary of
this is that the following inequality holds, for any data item Xi ,
1 IF RL + 1
Xi
PB
While the above discussion lays the foundation for the fundamental ideas behind
the notion of ignore factor, we still have to describe a general procedure for computing ignore factor. To this end, let us consider a data item X , for which a stream
of requests arrive. In this case, computation of IFX assumes greater complexity
than discussed before. To see this, consider the case where two requests arrive for
X , at times Treq1 and Treq2 respectively, where Treq2 > Treq1 . Assume X is not
included in any broadcast until time Tdep1 = Treq1 + RL
k Tdep1 , according to the
j . At
RL
discussion in the two preceding paragraphs, IFX = PB + 1. However, as soon
as the clock ticks past Tdep1 , i.e., after Tdep1 , the request that came in at Treq1 is
considered void. Thus it should no longer determine IFX . Rather, after Tdep1 , IFX
should be computed based on Treq2 . More specically, at all times during the inter-
17
val [Tdep1 , Tdep2j] (excluding the initial
temporal instant), where Tdep2 = Treq2 + RL,
k
time
,
T
current
req
2
+ 1.
IFX is given by
PB
From the simple two-request example above, one important fact emerges: in order
to maintain an accurate record of an item's IF, one must track every request that
arrives for that item. With a large number of items and clients, such maintenance
could be prohibitively expensive. In particular, the complexity of this process for
a specic data item is of order O(N ), where N is the number of requests for that
item. However, we have devised a mechanism by which the IFs of items can be
maintained accurately at minimal cost. In particular, as will be clear later, our
strategy records IF for a data item in constant time.
We rst provide the intuition for this process and then describe it in detail. To
see the basic idea behind our mechanism, consider gure 4, in which we show a
B 1 T req1
T req2
B
T
2
req3
T req4
B T dep1
3
T dep2
B4
Time
Fig. 4. Computation of Ignore Factor
timeline with particular points of interest. The points marked with the darker
lines, i.e., B1 ; B2 ; B3 ; B4 , denote broadcast initiation epochs, i.e., instants at
which successive broadcasts start. The ith request for data item X arrives at
Treqi . In gure 4, we show four requests. In this example, we assume RL =
2 PB . Corresponding to the request points, the departure points are shown as
Tdepi . According to the discussion above, we need to keep track of all four requests
in order to accurately compute IFX at any point. However, note that IF values
will only be used at the time the server decides on the next broadcast set, i.e.,
immediately preceding the broadcast epochs. Consider, as a specic example, the
broadcast epoch B4 . At B4 , in gure 4, the request at Treq1 has expired. Note
that the request at Treq2 has also expired. Moreover, we can safely say that at B4 ,
all requests that arrived prior to B2 would have expired. In general, at all decision
points where the next broadcast set has to be decided,
i.e., at successive broadcast
j
kth
RL
epochs, all requests that arrived prior to the PB
broadcast preceding the
next immediate broadcast are guaranteed to have expired. j
Thus,kIFX needs to
th broadcast
be computed based on the rst request that arrived after the RL
PB
preceding the next immediate broadcast. This is the key intuition. For example, at
B4 , IFX needs to be computed based on Treq3 . The signicance of this intuition
is that the system need only keep track of the rst request after broadcast epochs
and can ignore
j allk successive requests until the next broadcast epoch. Moreover,
only the last RL
need to be considered. In other words, the system need
PB epochs
k
j
only maintain a list of RL
PB requests.
18
This may be achieved by maintaining an ordered list in memory. Elements
j in kthis
list will contain both the tid and timestamp of the request. The list has RL
PB +1
k
j
elements, with tids 1 through RL
PB +1, where the timestamp corresponding to the
th
i element yields the timestamp of the rst request for X after the ith previous
broadcast epoch with respect to the next immediate broadcast. The list is updated
upon the arrival of the rst request for a data item after each broadcast epoch.
Such a list can be implemented as a circular
Using such a structure, an item
j queue.
k
can be accurately tracked in constant ( RL
)
time.
PB
Priority computation using IF and PF: Having explained IF we now turn our
attention to computing the inclusion priority for data items. We propose that an
item's priority be computed based on the following expression
Priority = IF ASF PF
(2)
where ASF is an Adaptive Scaling Factor described below. Equation 2 also explains
why we set the base value of IF at 1 and not 0. It is easily seen that, at an IF of
0, the priority value (i.e., for popular items) is reduced to 0, which guarantees the
item's exclusion from the broadcast.
Equation 2 exploits the counteracting trend of IF and PF. Using the above expression, an item which has been requested by a large number of clients will have a
large PF. It is thus likely to appear in many broadcasts, and will have a relatively
low IF. On the other hand, an item with few requests will have a low PF. Based on
its low PF, it is likely to be omitted from broadcasts, increasing its IF and therefore
its chances of being included in a future broadcast set.
ASF is an adaptive exponential weighting factor based on a nearest neighbor
approach. Its purpose is to increase the likelihood that items which have been
ignored for a long time will appear in the broadcast. PF and IF dier largely in
scale; if ASF is relatively low, PF dominates the priority expression (limited by the
number of clients in a cell). If, however, an item has been ignored for a long time,
we would like IF to dominate. A larger ASF value will achieve this.
In order to precisely dene ASF, let Wij denote the number of broadcasts client
i waited for item j after requesting item j. Let Cj denote the set of clients who are
waiting for j . Then, the average wait count (AWC) for j is given by,
W
AWCj = i2Cj ij
jC j
j
Let the Desired Wait Count (DWC) denote a quality of service measure dened for
the system. DWC denotes, on an average, the maximum number of broadcasts a
client can reasonably be expected to wait before receiving service for her desired
item. When computing the priority of an item, we compare AWC and DWC. If
AWC > DWC , ASF is incremented by unity. If AWC = DWC , ASF remains
unchanged. ASF is initialized to a base value for 1 for all data items, and reset to
this base value each time the item is included in the broadcast.
Having explained the underlying concepts, we are now prepared to describe the
server protocol for constructing a broadcast. Prior to a broadcast epoch (the time
at which a new broadcast is scheduled to begin), i.e., in its broadcast preparation
19
stage, the server prioritizes all items which have been requested, i.e., items with a
PF > 0, and sorts the items in order of descending priority. It then adds items to
the broadcast set until the broadcast is full. For all requested but excluded items,
IF is adjusted.
6.1.2 CBS Client Protocols. We now describe client protocols designed to smartly
retrieve data from the broadcast in cooperation with the server protocol dened
above. When a client senses the need for a particular data item, he begins the
retrieval process by tuning in to the broadcast at an arbitrary time and reading a
bucket. We remind the reader that the data cluster in the broadcast that holds the
item of a client's interest is referred to as the Data Cluster of Interest (DCI).
The random initial probe in a continuous ow of broadcasts creates a large number of tuning possibilities. We identify seven mutually exclusive initial tuning possibilities. Note that because we assume a B-tree index structure, the client must
start reading from the top of the index segment (i.e., the root). If she does not nd
a pointer to her DCI (i.e., DCI is not in the current broadcast set), then she requests
the item and tunes to the initial index of every succeeding broadcast until either
the DCI is found in broadcast, or she departs from the cell. We discuss the cases
below. We assume, for simplicity, that any probe into a broadcast always puts the
client at the top of a bucket, i.e., we ignore the fact that the client may tune in at
any point in a bucket. The error introduced by this, as easily seen, is very small
(0:5 bucket on average).
Case 1: Client initially tunes in to an index bucket, DCI exists in current
broadcast, and the client has tuned in before DCI in the broadcast
In this case the client tunes in to an index segment. If he has missed the root, he
must tune out until the top of the next nearest index segment. This is eciently
achieved by noting the X value (described in section 3) from the 3-tuple of tuning
information provided with every bucket. Upon reading the index, the client nds
that his DCI has not yet been broadcast, and tunes out until the beginning of his
data cluster of interest. At the beginning of his DCI, he tunes back in and downloads it. Thus the retrieval protocol in this case is given in Protocol 1.
Protocol 1 CASE 1
access top of next nearest index segment
read index;
wait until beginning of DCI;
download DCI;
Case 2: Client initially tunes in to an index bucket, DCI exists in current
broadcast, and the client has tuned in after DCI in the broadcast
As in the previous case, the client tunes in to the top of the next nearest index
segment, and becomes aware that her DCI has already passed by comparing the
current bucket id with the B value (explained in section 3) in the index tuple corresponding to her data item of interest. She also reads the E value in that index
tuple to determine if her DCI is scheduled for removal from the broadcast set following the current broadcast. This is easily achieved by comparing the EDT (i.e.,
the E value) of the data item with the start time of the next broadcast, which is
20
easily derivable from the Y value of any tuning tuple. If her DCI will appear in
the next broadcast, she tunes out until the top of the next broadcast, reads index
information and downloads her DCI as outlined in the previous case. If her DCI is
scheduled to be dropped, she submits a request for it, and keeps tuning in to the
top of every subsequent broadcast until a pointer to it appears in the index, or she
leaves the cell. The pseudocode for Case 2, combined with that of Case 3, is shown
below.
Case 3: Client initially tunes in to an index bucket, DCI does not exist in current broadcast This case is very similar to the second subcase in case
2. After becoming aware of the non-existence of his DCI in current broadcast, by
appropriate access into an index segment, the client submits a request for the data
item, and keeps tuning in to the top of every subsequent broadcast until his DCI
appears, or he leaves the cell. The protocol is as given in Protocol 2.
Protocol 2 CASE 2 & 3
access top of an index segment;
read index;
if
DCI missed or excluded from current broadcast
wait until beginning of next broadcast;
if
DCI not in index
then
then
submit request for DCI;
while
DCI not in index
read index;
download DCI;
Case 4: Client initially tunes in to a data bucket, DCI exists in current
broadcast, and the client has tuned in before DCI in the broadcast
The procedure is exactly the same as in case 1.
Case 5: Client initially tunes in to a data bucket, DCI exists in current
broadcast, and the client has tuned in after DCI in the broadcast
Same as case 2.
Case 6: Client initially tunes in to a data bucket, DCI does not exist in
current broadcast
Same as case 3.
Case 7: Client tunes in to a data bucket in the middle of the DCI in the
current broadcast
This is the most complex of all cases, and also the most important, as the same
protocol would be used by clients who are interested in downloading their DCIs continually from successive broadcasts. In this case the client tunes in and becomes
aware that her DCI is currently being broadcast. She immediately begins to download the data. Since she missed a portion of her DCI in the current broadcast, the
remainder must be obtained from the next broadcast. At the end of the data cluster, having already determined whether her DCI is scheduled to be dropped from
21
the next broadcast (through methods outlined earlier), she tunes out and waits for
the beginning of the next broadcast. If her DCI is going to be dropped, she sends
a request on the uplink and keeps tuning in to successive broadcasts until her DCI
is found in the index. The more interesting case occurs when she determines that
her DCI is not going to be dropped, i.e., it is going to appear in the next broadcast.
In this case, as discussed in the \rapidly changing" data description in section 5,
she would like to download not only the portion of her DCI missed in the previous broadcast, but also the buckets that have been modied between the broadcast
epochs. This is achieved as follows: The client downloads the portion of data missed
in the earlier broadcast until the point at which she began downloading data in the
previous broadcast. She then selectively tunes in and out to retrieve any updates
(following the linked list of dirty buckets) to the data downloaded in the previous
broadcast until the last dirty bucket in her data cluster of interest. Protocol 3 and
Procedure 1 provide details regarding how the selective tuning is achieved.
Protocol 3 CASE 7
access middle of DCI;
download remaining buckets of DCI;
read index;
if
DCI in index;
execute procedure
else
DYNAMIC READ;
submit request for DCI;
while
DCI not in index
read index;
download DCI;
Case 8: Client tunes in to an index bucket in the middle of the DCI in
the current broadcast
Same as case 2.
6.2 Variable Broadcast Size Strategy
Having discussed a periodic broadcasting strategy, we now turn our attention to an
aperiodic broadcasting scenario. We call this strategy the variable broadcast size
(VBS) strategy, for obvious reasons. Note that while the broadcast size varies across
broadcasts, at the start of each individual broadcast the size is known; therefore,
the start of the subsequent broadcast is known as well. However, the server has no
knowledge beyond that, as it does not know what requests may arrive during the
current broadcast.
22
Procedure 1 DYNAMIC READ
if
DCI index.C
next dirty
> 0 then
DCI index.C; /* find 1st dirty bucket */
,1;
next dirty
not at point in DCI where began reading in previous broadcast
download bucket;
if current bucket.Z
= ,1 then
,1;
next dirty
else
if current bucket.Z
next dirty
if
+
/* in cluster */
else
while
/*index tuple C value of the DCI*/
current bucket.id
/*entire data cluster is clean*/
> 0 then
+
(next dirty
current bucket.Z);
advance to next bucket;
next dirty
exit;
= ,1 then
/*the rest of DCI is clean*/
wait until next dirty;
while (true) /*read all remaining
if current bucket.Z = ,1 then
fdownload bucket;
exit;g
else
next dirty
(next dirty
+
dirty buckets in DCI*/
current bucket.Z)
download bucket;
wait until next dirty;
6.2.1 VBS Server Protocol. The server protocol for VBS is much simpler than
that for the constant size strategy. All requested items are included, i.e., all items
with a PF greater than 0 are added to the broadcast set. The broadcast length
changes as items are added and deleted. Items remain in the broadcast for RL units
of time from their latest request, and are subsequently dropped from the broadcast
set. Within the broadcast, items (i.e., DCIs) are ordered based on descending
popularity. Since no item is ignored, there is no notion of ignore factor in VBS.
6.2.2 VBS Client Protocols. The client protocols in this strategy are similar to
those of the CBS strategy. The main dierence between these strategies is in the
client's response to nding that his DCI is not in the broadcast. Here, if his DCI is
not in the broadcast, or if he has missed his DCI and the item will be dropped when
the next broadcast is composed, the client requests the item and exits from the
protocol. Since his DCI is guaranteed to be in the succeeding broadcast, he begins
the retrieval process at the beginning of the next broadcast and nds his DCI in
that broadcast.
7. AN APPROXIMATE ANALYSIS FOR THE CONSTANT AND VARIABLE BROADCAST SIZE STRATEGIES
In this section, we are primarily interested in obtaining expressions for the expected
access time (AT) and tuning time (TT) for both the constant broadcast size (CBS)
23
and the variable broadcast size (VBS) strategies.
Both strategies, as the reader may recall from section 6, are presented as a set of
mutually exclusive tuning scenarios. For each scenario, we compute AT and TT.
We also compute the probability of encountering each case. Then, using elementary
probability theory, expected access time (E [AT ]) and expected tuning time (E [TT ])
may be stated as:
E [TT ] =
X
8i
P fScenario igTTi;
E [AT ] =
X
8i
P fScenario igATi
(3)
In order to achieve analytical tractability, we make several assumptions in performing our analysis. Thus our results are approximate. Our assumptions are as follows:
(a) clients read from the broadcast in units of buckets; (b) clients always tune in to
the start (i.e. the top) of a bucket; (c) if the client tunes in to the middle of an index
segment, she has missed the root of the B-tree, and must wait for the beginning of
the next nearest index segment; (d) a cell contains N clients, and each client can
request at most one DCI during a single broadcast period; and (e) there are two
classes of data items. A hot item has a higher probability of being requested by the
clients than a cold item. h percent of DCIs are hot and 1 , h percent are cold items.
In addition, certain special assumptions are made for the CBS and VBS strategies. These assumptions are stated preceding each respective analysis section. The
next two subsections list our results. Table 1 lists the notations used in our analysis.
Notation Meaning
d
S
n
m
B
I
TB
b
CCBS
l
L1
L2
ph
pc
ph;in
pc;in
Pin
h
c
p0
k
size of data to broadcast (in items)
selectivity of a data item (in buckets)
capacity of a bucket (tuples per bucket)
number of index segments or data blocks per broadcast
d
size of data block (buckets per block) = nm
2
d
size of index segment (in buckets) = Sn
size of the broadcast in CBS = m(I + B )
d
number of DCI = Sn
capacity of broadcast in CBS (in DCI)
size of an index segment (in buckets) = logn+1 I
expected distance to DCI from the end of index segment (buckets)
expected distance to next broadcast from end of index segment (buckets)
probability that a client requests a hot item
probability that a client requests a cold item
probability that a hot item is in broadcast
probability that a cold item is in broadcast
probability that an item is in broadcast
fraction of hot items in the d
fraction of cold items in the d
probability that a client does not request any DCI
total number of DCIs in the database
Table 1. Notations used in the Analysis
24
Our analysis is probabilistic, i.e., we compute the probabilities of certain events.
The events are summarized in Table 2.
Event Meaning
M
M
PM
Di
TIi
TD
TI
PH
PC
DCI is missed completely
DCI is downloaded completely (i.e., not missed at all)
DCI is missed partially
DCI is in the ith data block
tune in to ith index segment rst
tune in to a data block rst
tune in to an index segment rst
client requests a hot item
client requests a cold item
Table 2. Probabilistic Events
7.1 Derivation of Event Probabilities
We start out by deriving probabilities for events that we consider in our analysis,
outlined above in Table 2. Note that all of the aforementioned events are valid for
both CBS as well as VBS strategies.
TI g. Given that a client tunes in to an index segment, the
Derivation of P fM;
probability that the client does not miss DCI at all is denoted by P fM j TI g.
P
P fM j TI g = Pmi=1 P fTIi \ fDi+1 [ Di+2 [ Dm gg
,1
= Pm
(4)
im=1,1 P fTIi gP fDi+1 [ Di+2 [ Dm g
,1
= i=1 m1 mm,i = m2m
The probability that the client tunes in to an index segment is denoted by P fTI g
(the ratio of the total size of index segments over the total broadcast size). That
is,
I
P fTI g = m(mI
I +B ) = I +B =
2Data
Sn Data
2Data
Sn + nm
= 2m2m+S
(5)
TI g = P fM j TI gP fTI g = m , 1
P fM;
2m + S
(6)
TDg = P fM j TDgP fTDg = m , 1 S
P fM;
2m 2m + S
(7)
TDg.
Derivation of P fM;
P fM j TDg = 1 , P fM j TDg , P fPM j TDg
,1 , B,S , S
= 1 , m2m
mB mB
1
= m2,
m
25
Derivation of P fM; TI g.
1 m+1
P fM j TI g = 1 , P fM j TI g = 1 , m2,
m = 2m
1 2m
P fM; TI g = P fM j TI gP fTI g = m2+
m 2m + S
(8)
Derivation of P fM; TDg. Given that a client tunes in to a data block, the probability that the client misses the DCI completely is denoted by P fM j TDg. As in
equation (4),
P fM j TDg =
m
X
i=1
P fM j TD; Di gP fDi j TDg
P fM j TD; Dig = P fM j j > i; TD; DigP fj > i j TD; Dig+
P fM j j = i; TD; DigP fj = i j TD; Dig+
P fM j j < i; TD; DigP fj < i j TD; Dig
where j is the index of the data block to which a client tunes in. Thus,
1
P fj > i j TD; Di g = m2,
m
P fj = i j TD; Di g = m1
1
P fj < i j TD; Di g = m2,
m
Denitely,
P fM j j > i; TD; Di g = 1
P fM j j = i; TD; Di g = B B, S
P fPM j j = i; TD; Di g = BS
1 B,S 1
P fM j TD; Di g = m2,
m + B m
P fM j TDg =
=
m
X
P fM j TD; Di gP fDi g
i=1
m
X
i=1
1 B,S 1 1
( m2,
m + B m)m
1 B ,S
= m2,
m + mB
B , 2S S
P fM; TDg = P fM j TDgP fTDg = mB +2mB
2m + S
(9)
26
Derivation of P fPM; TDg.
P fPM j TD; Dig = P fPM j j = i; TD; DigP fj = i j TD; Dig
S
= BS m1 = mB
where j is the index of the data block to which a client tunes in.
m
X
S 1 = S
P fPM j TDg =
mB
m mB
i=1
S S
P fPM; TDg = P fPM j TDgP fTDg = mB
2m + S
(10)
Table 3 summarizes the probabilities derived above. Note that in the subsequent
proofs, we shall repeatedly use the probabilities from table 3.
Events
Probabilities
TI g 2mm,+1S
P fM;
TDg m2m,1 2mS+S
P fM;
P fM; TI g 2mm+1
+S
,S ) S
P fM; TDg ( m2m,1 + BmB
2m+S
S S
P fPM g
mB 2m+S
Table 3. Probabilities of Events
7.2 Analysis of Constant Broadcast Size Strategy
We rst present the results of our analysis of the CBS strategy. For each Scenario
i, we present the probability of encountering it ( P fScenario ig) and expressions
for Tuning Time and Access Time for that scenario, i.e., TTi and ATi . Our results
are presented as a series of lemmas.
While there are seven cases presented in section 6, there exists a substantial
amount of commonality between the cases and they may be aggregated. In the
ensuing analysis, we have treated the seven cases through six scenarios, termed
Scenario 1, Scenario 2, : : :, Scenario 6 respectively. Each of these scenarios map
to particular cases described in section 6. Specically, Scenario 1 (presented in
Lemma 3) maps to Cases 1 and 4. Scenario 2 (presented in Lemma 4) and Scenario
3 (presented in Lemma 5) map to Cases 2 and 5 respectively, while Scenario 4
(presented in Lemma 6) and Scenario 5 (presented in Lemma 7) map to subcases
of Case 7. Scenario 6 (presented in Lemma 8) maps to Cases 3 and 6.
We make two special assumptions in our analysis of the CBS strategy: (a) We
assume that items are priority ranked based purely on popularity, i.e. the notion of
Ignore Factor (IF) is not considered. This was done because it was very dicult to
include IF in a mathematically meaningful way in a single analysis. We do include
IF in our simulation experiments. (b) We assume that when a request is made, the
requesting client will receive its DCI before exiting the cell. Further, for simplicity,
we assume that P fDCI 2 BRi g, i.e., the probability of a requested DCI is in the
ith broadcast, is independent from P fDCI 2 BRi+1 g, thus for a DCI, pin remains
the same throughout all broadcasts.
27
Definition 1. Expected Waiting Time, denoted by E [Tw ], is the average amount
of time (in number of buckets) that elapses between the submission of a request for
a DCI by a client and the start of downloading of that DCI from a subsequent
broadcast by the client.
Lemma 1. Expected Waiting Time (E [Tw ]) is expressed as,
E [Tw ] =
1
X
j =0
( 21 + j )m(I + B )(ph (1 , ph;in )j ph;in + pc (1 , pc;in )j pc;in )
(11)
Proof: See Appendix.
Definition 2. Pos(i) is a function that returns the ordinal position of an element i in an ordered list A, given that i is an element of A.
Lemma 2. Under the Constant Broadcast Size strategy, the probability that a
DCI exists in the broadcast that the client initially tunes in to, denoted by Pin , is,
Pin =
b
b
(1 + + b )! Y
i p + X (1 + + b )! Y pi p
p
Qb
Qb
i h
i c
i=1 i ! i=1
i=1 i ! i=1
fBh g
fBc g
X
(12)
where Bh = f(1 ; 2 ; ; b ) j i 2 H; (Pos(i ) 2 [1; CCBS ])g, Bc = f(1 ; 2 ; ; b ) j
i 2 C; (Pos(i ) 2 [1; CCBS ])g, i is the number of requests to item i. H and C are
hot and cold item sets, respectively.
Lemma 3. Let Scenario 1 denote the scenario where DCI is in the current broadcast and the client did not miss it. Under Scenario 1, the following expressions
hold.
1 S
(13)
P fScenario 1g = Pin ( 2mm ,+ 1S + m2,
m 2m + S )
TT1 = 1 + l + S
(14)
E [AT1 jScenario1] = Pin 2mm ,+ 1S ( I2 + B + I + L1 + S + ( B2 + I + L1 + S ) 2Sm ) (15)
Lemma 4. Let Scenario 2 denote the scenario where the DCI is in the current
broadcast but the client completely missed it. Further, the DCI appears in the next
broadcast. Under Scenario 2, the following expressions hold.
,1 + B ,S) S )
P fScenario 2g = Pin2 ( 2mm ,+ 1S + ( m2m
(16)
mB 2m + S
TT2 = 1 + 2l + S
(17)
E [AT2 jScenario2] = Pin2 2m1+ S ((m , 1)( 32I + B + L2 + m(I 2+ B ) + S ) + (18)
m(I + B )
1 B,S B
(19)
( m2,
m + mB )S ( 2 + I + L2 + 2 + S ))
28
Lemma 5. Let Scenario 3 denote the scenario where the DCI is in the current
broadcast but the client completely missed it. Furthermore, the DCI will not appear
in the next broadcast, and thus the client must submit a request to the server. Under
Scenario 3 the following expressions hold, assuming that the DCI will appear in a
broadcast before the client exits the cell.
1 B,S S
P fScenario 3g = Pin (1 , Pin )( 2mm ,+ 1S + ( m2,
m + mB ) 2m + S ) (20)
t]
(21)
TT3 = 1 + mE(I[W
+ B) l + S
t]
AT3 = m(I + B )( mE(I[W
(22)
+ B ) + 1) + S
Lemma 6. Let Scenario 4 denote the scenario where the client partially missed
the DCI in the current broadcast, and the DCI reappears in the next broadcast.
Under Scenario 4, the following expressions hold.
S S
P fScenario 4g = Pin2 mB
2m + S
S
TT4 = 2 + l + S2
AT4 = m(I + B )
(23)
(24)
(25)
Lemma 7. Let Scenario 5 denote the scenario where the client partially missed
the DCI in the current broadcast, but the DCI does not appear in the next broadcast (it appears before the client exits the cell). Under Scenario 5, the following
expressions hold.
S S (1 , P )
P fScenario 5g = Pin mB
in
2m + S
t]
TT5 = S2 + l + mE(I[W
+ B) l + S
t]
AT5 = m(I + B )( mE(I[W
+ B ) + 1) + S
(26)
(27)
(28)
Lemma 8. Let Scenario 6 denote the scenario where the DCI is not in the current
broadcast but appears before the client exits the cell. Under Scenario 6, the following
expressions hold.
P fScenario 6g = (1 , Pin )
(29)
t]
TT6 = 1 + l + mE(I[W
(30)
+ B) l + S
t ] + 1) + S
AT6 = m(I + B )( mE(I[W
(31)
+ B)
29
7.3 Analysis of Variable Broadcast Size Strategy
In this section, we present our analysis of the variable broadcast size strategy in
a series of scenarios. As in the CBS analysis presented in the previous section,
these scenarios, described through a series of lemmas, are aggregations of the cases
presented in section 6. Specically, Scenario 7 (Lemma 13) refers to Case 1, Scenario
8 (Lemma 14) to Case 4, Scenario 9 (Lemma 15) to Case 2, Scenario 10 (Lemma 16)
to Case 5, Scenario 11 (Lemma 17) to Case 7. Case 3 and Case 6 are aggregated
in Scenario 12 (Lemma 18). Note that the E [TT ] and E [AT ] are calculated by
multiplying the probability terms in Scenarios 7 through 11 by P fDCI 2 BRg.
Further note that in the abovementioned lemmas we do not derive the AT and TT
expressions, as in the CBS case. Recall that we provided a sample derivation of AT
and TT in Lemma 3.
We remind the reader at this point that the VBS strategy server protocol will
always place requested DCIs in the subsequent broadcast. Therefore, if a client's DCI
is not in the current broadcast, she is guaranteed to nd it in the next broadcast.
Lemma 9. Let L1 denote the expected distance from end of index to the DCI,
given that the DCI is not missed. L1 is expressed as,
1)(m , 2)
(32)
L1 = E fDistance to DCIjM g = B (m4m, 1) + (I + B )(m6,
m
Lemma 10. Let L2 denote the expected distance from end of index to the next
broadcast, given that the DCI is missed. L2 is expressed as,
2
L2 = E fDistance to next broadcastjM g = 21m ((m , 1)B + (I + B ) 2m ,66m + 3 )
(33)
Lemma 11. Let P fDCI 2 BRg denote the probability that the DCI is in the
current broadcast under VBS. P fDCI 2 BRg is given by,
p0 )ph )N )p + (1 , (1 , (1,p0 )pc )N )p
P fDCI 2 BRg = (1 , (1 , (1,bh
h
c
b(1,h)
(34)
Lemma 12. Let BS denote the size of the broadcast under VBS. The expectation
of BS is dened as,
E fBLg =
d
X
j =0
jS P fBL = jS g
(35)
where,
P fBL = jS g = P fj dierent DCI from N clientg
j X
bh b(1 , h) X QN ! ( (1 , p0 )ph )l+Piq=1 q ( (1 , p0 )pc )j,i+Pjq=i+1 q
=
j,i
q !
bh
b(1 , h)
i=0 i
A
30
and, A = (1 ; ; j ); 8q 0; q 2 [1; j ];
j
X
q=1
q = N , j; i = number of requests to item i
Lemma 13. Let Scenario 7 denote the scenario where the client tunes in to an
index bucket but does not miss the DCI in the current broadcast. Under Scenario
7, the following expressions hold.
P fScenario 7g = 2mm ,+ 1S
(36)
TT7 = 1 + l + S;
(37)
I
(38)
AT7 = 2 + B + I + L1 + S
Lemma 14. Let Scenario 8 denote the scenario where the client tunes in to a
data bucket before the data block containing the DCI in the current broadcast. Under
Scenario 8, the following expressions hold.
1 S
P fScenario 8g = m2,
(39)
m 2m + S
TT8 = 1 + l + S;
(40)
(41)
AT8 = B2 + I + L1 + S
Lemma 15. Let Scenario 9 denote the scenario where the client tunes in to an
index bucket but missed the DCI in the current broadcast. Under Scenario 9 the
following expressions hold.
P fScenario 9g = 2mm ++ 1S
(42)
TT9 = 1 + 2l + S
(43)
I
m
(
I
+
B
)
AT9 = 2 + B + I + L2 + 2 + S
(44)
data bucket, but missed the DCI in the current broadcast. Under Scenario 10, the
following expressions hold.
B , 2S S
P fScenario 10g = mB +
(45)
2mB
2m + S
TT10 = 1 + 2l + S
(46)
AT10 = B2 + I + L2 + m(I 2+ B ) + S
(47)
data bucket in the middle of the DCI in the current broadcast. Under Scenario 11,
31
the following expressions hold.
S S
P fScenario 11g = mB
2m + S
S
TT11 = 2 + l + S2
AT11 = m(I + B ) + S4
(48)
(49)
(50)
Lemma 18. Let Scenario 12 denote the scenario where the client tunes in to
either a data bucket or an index bucket. Given that DCI does not exist in the
current broadcast, under Scenario 12 the following expressions hold.
P fScenario 12g = 1 , ((1 , (1 , (1 ,bhp0 )ph )N )ph + (1 , (1 , (1b(1, ,p0h)p)c )N )(51)
pc )
TT12 = 1 + 2l + S
(52)
AT12 = m(I + B ) + S
(53)
8. DISCUSSION
Based on our analyses of both the constant broadcast size and the variable broadcast
size strategies, we plotted the tuning time (TT) and access time (AT) curves for
various parameters. In this section, we illustrate our strategies in a small scale
example, primarily due to the computational costs involved. Although the database
used in the examples consists of 200 buckets, we believe it is large enough to capture
the essence of the strategies. Our parameters are: b = 10, S = 20, n = 1, ph = 0:8,
h = 0:2, c = 0:8, m = 4 for VBS, and CCBS = 4 for CBS, initially.
8.1 Discussion on AT and TT curves
Here, in gures 5A and B, we illustrate the changes in the tuning and access times
of the two strategies that we proposed. In both gures, we vary the number of
clients in the cell, keeping other parameters constant.
Tuning Time (TT) curves:. We rst turn our attention to the variable broadcast
size (VBS) strategy. VBS strategy includes every requested item in the broadcast.
As number of clients increases, the set of requested items grows as well. As a result,
the size of the index segments in VBS increases. Clearly, TT is proportional to the
size of the index. Thus in VBS, TT increases as the number of clients increases.
This explains the upward trend of the TT curve for VBS in gure 5A.
Another interesting feature of the TT curve for VBS is that the slope decreases
along the curve. It is easily seen that the slope of this curve is proportional to the
rate of growth of the requested item set, i.e., the faster the growth of the demand
set, the faster the growth of the tuning time. When the cell is sparsely populated,
each additional user has a high probability of asking for new items. However, under
heavier population loads, new arrivals tend to ask for items already in the request
set, which slows down the growth of this curve. We call such requests redundant
requests.
32
26
24
Tuning Time (buckets)
22
20
18
16
14
CBS
VBS
12
10
10
12
14
16
Number of Clients
18
20
80
Access Time (buckets)
70
60
50
40
CBS
VBS
30
20
10
12
14
16
Number of Clients
18
20
Fig. 5. Experimental Results: [A] TT for VBS and CBS; [B] AT for VBS and CBS
We now turn our attention to the second curve in gure 5A; the curve for the
constant broadcast size strategy (CBS). In the CBS strategy, index size is limited
because the broadcast size is limited. As the number of clients increases, the number of redundant requests, particularly for hot items, increases. Therefore, the
probability that a requested item is in the broadcast (pb ) increases, lowering TT
for larger client sets. For example, for N = 10, pb is 0:830155 whereas for N = 12,
it increases to 0:840821. Here, in the case where N = 12, clients have a higher
probability of downloading the DCI in the current broadcast than in the case where
N = 10. Clients who download the DCI in the current broadcast need not submit a request and tune to the index of successive broadcasts, thus saving tuning
time. As the number of clients increases, the number of non-redundant requests
also increases, which slows the decrease in TT as illustrated in gure 5A.
Having discussed the individual TT curves, we now focus on explaining the relative performance of the two strategies. With the current set of parameters,VBS
performs slightly better than CBS for low to moderate loads in the system. How-
33
ever, as the load, i.e., the number of clients, increases, CBS performs better. Intuitively, this makes sense. When the number of clients grows very large, VBS will
include all their requests, which will include many cold items as well as hot items.
This causes indices to grow to a point where TT becomes very large. By virtue of
limiting the size of the index, CBS does not suer from this problem.
Access Time (AT) curves:. In the variable broadcast size strategy, AT responds
to an increasing number of clients in the same way as TT. It increases as the
number of clients increases, but the increase slows as redundancy in the request set
increases. This can be seen readily in gure 5B.
Initially in the constant broadcast size strategy, the server does not receive enough
requests to ll the broadcast, causing dead air in the broadcast. Clients are forced
to submit requests for their DCIs and wait for the next broadcast. This contributes
to the relatively high initial AT. As the broadcast starts to ll up, the eect of the
increasing number of clients on AT decreases. Hot items are likely to be requested
by many clients, and are therefore likely to be in the current broadcast when a
client tunes in. Figure 5B shows the curve for the CBS.
The two curves in gure 5B behave dierently at dierent load levels. The VBS
curve performs better than the VBS when the system load is low. For moderate to
high loads in the system, VBS AT increases sharply while CBS AT remains fairly
constant. In this scenario, CBS provides a lower access times for clients than VBS.
Overall, it can be deduced that when the system load is low, VBS outperforms
CBS for both TT and AT. For moderate loads, there is a trade-o between the two
strategies: VBS performs better for TT and CBS performs better in terms of AT.
For high loads in the system, CBS dominates VBS, providing lower TT and AT.
8.2 Changes in TT and AT for VBS as the number of indices in broadcast varies
The number of index segments in the broadcast (m) is the most important parameter for the VBS because the broadcast size is not limited. Increasing m provides
a larger number of opportunities for clients to read the index, which tends to lower
tuning times, but increases the overall size of the broadcast, which increases access
times. Therefore, we plotted the TT and AT curves for comparing the eect of
various m values. Figures 6A and B illustrate the changes in TT and AT curves,
respectively, as m changes.
TT curves:. Figure 6A shows that the change m does not have a signicant eect
on the TT. Since the data in the broadcast remains constant at a given client size,
the index size remains constant as well. Slight dierences in the curves are mainly
due to round-up errors, and TT curves converge to a single curve.
AT curves:. At a given number of clients, even if the m value is dierent, the
amount of data in the broadcast is the same. However, for dierent values of m,
the broadcast overall size changes since we change the number of index segments.
Therefore, if m is increased, the size of the broadcast grows, thus increasing the
AT.
34
26
24
22
20
18
16
14
VBS with m = 2.0
VBS with m = 3.0
VBS with m = 4.0
12
10
10
12
14
16
Number of Clients
[A]
18
20
80
70
60
50
40
VBS with m = 2.0
VBS with m = 3.0
VBS with m = 4.0
30
20
10
12
14
16
Number of Clients
18
20
[B]
Fig. 6. Experimental Results: [A] TT for VBS with various m values; [B] AT for VBS with
various m values
8.3 Changes in TT and AT for CBS as broadcast capacity varies
The crucial parameter in the Constant Broadcast Size strategy is the capacity of the
broadcast denoted by CCBS . Limiting the capacity of the broadcast and favoring
more popular items for inclusion reduces the access and tuning times for hot items,
at the expense of colder items. The basic premise of the strategy is that with a
\good" value of CCBS , both access and tuning times may be reduced. In gures 7A
and B, we illustrate the eects of changing the broadcast capacity (CCBS ) on
TT and AT for the constant broadcast size strategy. Note that in the curves the
parameter Capacity represents CCBS .
TT curves:. Here, we turn our attention to gure 7A, and observe the eects of
changing the broadcast capacity on TT.
For CCBS = 2, pb increases as the number of clients in the cell increases. However,
a broadcast capacity of 2 DCIs is extremely small. In this scenario, hot items are
continually included in the broadcast, causing the exclusion of cold items. Clients
35
50
40
30
20
CBS with Capacity = 2
10
0
10
12
14
16
Number of Clients
18
20
[A]
200
150
100
50
0
10
12
14
16
Number of Clients
18
20
[B]
Fig. 7. Experimental Results: [A] TT for CBS with various CCBS values; [B] AT for CBS with
various CCBS values
requesting cold items are forced to tune in to each successive broadcast until they
exit the cell or the request becomes invalid. This results in a considerable increase
in TT.
For CCBS = 4, pb is higher than in the case where CCBS = 2. Both hot and
cold items have a higher chance to be included in the broadcast as the capacity of
the broadcast grows. Therefore, clients are less starved, and the Expected Waiting
Time is lower than in the case where CCBS = 2. The number of redundant requests,
particularly for hot items, increases as the number of clients increases, causing a
slight decrease in the TT curve.
The curve for CCBS = 8 is similar to the curve for CCBS = 4. Its location above
the CCBS = 4 curve may seem counterintuitive at rst; we would expect the tuning
time to decrease even more by increasing the size of the broadcast. However, this
is an excellent example of unnecessarily high broadcast size. The server does not
receive enough requests to ll the broadcast capacity, and broadcasts dead air.
36
AT curves:. Expected Waiting Time is the most signicant part of the AT because clients who have missed their DCIs, or whose DCIs are not included in the
current broadcast, must wait, at a minimum, until the next broadcast to retrieve
their DCIs. This waiting time is not considered part of TT. The curves for AT in
gure 7B are similar to those shown for TT in gure 7A.
It can be concluded that, in order for CBS to perform at its best, the broadcast
size parameter should be adjusted to the load in the system. This result emphasizes
the importance of adaptive capability of broadcasting strategies.
9. SIMULATION MODEL
While the results of the previous section were informative, our analytical study is
subject to certain limitations, e.g., we omitted consideration of Ignore Factor in
order to make our analysis mathematically tractable. In order to overcome these
limitations, we performed an empirical performance evaluation by means of an
extensive simulation study.
Our simulation programs were written in SIMPACK [Fishwick 1995], a C/C++
based simulation toolkit.
Our simulation model contains three major components:
(1) A database component
(2) A client component
(3) A server component
9.1 Database and Broadcast Model
Our database consists of DatabaseSize data objects (items). Information about
each item is represented by a certain number of xed size records. As explained
earlier in the paper, the records corresponding to a given data item form a data
cluster in the broadcast. This structure was motivated by real-life examples. For
example, in the stock market domain, data items would correspond to particular
ticker symbols while records would be the information (such as price, volume, closing price of previous day etc.) corresponding to these ticker symbols in various
markets. Typically, stocks are traded in a xed number of markets (e.g., Company
X would be traded in 50 markets around the world); thus, given a particular data
item there would be a xed number of records, and a specic record would always
convey the same basic information (e.g., the third record in the IBM cluster corresponds to IBM stock information in the Frankfurt stock market). The number
of records corresponding to data items are generated from a truncated normal distribution within the range [MinRecordsPerItem, MaxRecordsPerItem] with a mean
of MeanRecordsPerItem and standard deviation of StdRecordsPerItem. Note that
this determination (i.e., number of records per item) is done only once in the simulation and remains xed thereafter. The sizes of data and index records in bytes
are denoted by DataRecordSize and IndexRecordSize respectively.
Our database model assumes locality of reference. A fraction of items, dened by
the parameter PercentHot, are designated as hot items. These items will be more
heavily requested by clients than cold items. In our model a client has a specic
data item of interest. When a client arrives at the cell, she requests a hot item
PercentChooseHot fraction of the time. Once it is decided whether a client requests
37
a hot or a cold item, the specic item is picked randomly from the corresponding
set.
The broadcast is modeled as shown in gure 3 and is characterized as a number
of xed size buckets, of capacity BucketSize bytes. Each bucket contains a header
of tuning information. Each index bucket contains a header of IndexHeaderSize,
while data buckets
. Thus, a data cluster of D
l contain headers of DataHeaderSize
m
D
DataRecordSize
records ts into BucketSize,DataHeaderSize buckets.
For indexing the broadcast, we use a (1,m) indexing scheme as described in
[Imielinski et al. 1997], andjorganize the index tuples in akB-tree format. Clearly,
,IndexHeaderSize index records.
each index bucket contains BucketSize
IndexRecordSize
In the case of the constant size strategy, the broadcast size is characterized as
BroadcastSize seconds. In case of the variable size strategy, the size of the broadcast
is determined by the number of data items requested.
The database is updated asynchronously. We are only concerned with what
fraction of buckets, per data cluster, have their contents changed (i.e., are dirtied)
per unit time. In other words, we are concerned with associating with each data
item Di a value Xi that models how the contents of Di change. This is modeled
using a stochastic variable which is drawn from a truncated normal distribution
in the range [MinChangeFrac, MaxChangeFrac], with a mean of MeanChangeFrac
and a standard deviation of StdChangeFrac. Note that if more than one broadcast
passes between two successive inclusions of a data item, we are not concerned with
modeling what fraction of the buckets has changed as our retrieval protocols will
assume that all data has changed and will consequently download the entire data
cluster.
9.2 Client Model
Clients arrive as a poisson process with rate ArrivRate. We vary ArrivRate to vary
the load in the system. The length of time a client remains in the cell is normally
distributed (left truncatedly), with a mean of MeanResidenceLatency, standard deviation StdResidenceLatency and a minimum residence of MinResdidenceLatency.
Note that during a client's stay, he is continually consuming valuable battery resources (at varying rates as pointed out later in this section). In order to show
the energy-conserving eects of our algorithms, we track the energy consumed by
clients in the course of their residence in the cell.
To track the energy consumption of clients we have constructed a detailed cost
model. Each mobile client machine is equipped with various energy consuming
components, some of which are more signicant than others. Our client model
assumes four major components which consume energy at varying rates. These four
components are: the disk drive, the CPU, the display unit, and the mobile
data card (MDC). We make two simplifying assumptions:
(1) Clients remain powered up for their entire residence in the cell. Therefore, we
do not consider the cost of powering up the mobile unit.
(2) Other components of the mobile unit, e.g., random access memory, consume
power at a constant rate, and are therefore of little interest.
First, we describe each of the four components of interest, their possible states,
38
energy consumption in those states, and events that trigger state changes. We
calculate energy consumption based on the equation ENERGY (Joules) = POWER
(Watts) x TIME (seconds). Table 4 shows typical energy consumption rates for
these components and states, based on existing product specications and literature
[Inc. 1997; Seagate Technology 1997]:
Component State
CPU
CPU
Disk
Disk
Disk
Display
Display
MDC
MDC
MDC
Active
Sleep
Write
Idle
Sleep
Active
O
Transmit
Receive
Sleep
Rate of Consumption
0:25 Watts
0:00005 Watts
0:95 Watts
0:65 Watts
0:015 Watts
2:5 Watts
0 Watts
0:4 , 0:6 Watts
0:2 Watts
0:1 Watts
Table 4. Component States and Power Consumption
Disk Drive. In this simulation we consider diskful clients. In the diskful state
we assume data are cached on disk and we are interested in disk operations and
corresponding energy usage.
The disk drive of the mobile unit exists in one of three states: read/write (RW),
idle (I), and sleep (S). In the RW mode, the head is actively reading or writing
data on the disk. In the I mode, the disk is spinning, but the head is not actively
reading or writing. In the S mode, the disk is not spinning and consumes energy
at a very low rate.
We assume that the disk drive is in the I state upon the client's arrival in the cell.
It enters RW mode at the start of a data download from broadcast { while data
is being downloaded from the broadcast, it is simultaneously being written to the
disk resident cache. Once the data write is over, the disk does not enter the S state
immediately (in keeping with existing disk models) [Li et al. 1993]. Rather, it reverts
to the I state for a short period (this is usually a tunable parameter in modern disk
drives and varies between 30 seconds and 5 minutes). In our simulation, we model
this phenomenon with the parameter IdlePeriod. At the end of IdlePeriod time, if
no additional disk I/O has occurred, the disk stops spinning, and enters the S state.
If the disk is in the S state as the time approaches for a broadcast download, the
disk enters the I state in preparation for the broadcast download. Waking the disk
from the S mode is expensive in that it requires a signicant expenditure of time
and energy. Spinning up the disk takes SpinUp seconds at a rate of consumption
of 3.0 Watts.
We note here that disk I/O may occur at slower rate than the broadcast transmission rate. Therefore, writing to disk may end at a later time than the broadcast
read ends, even though the broadcast read and the disk write may have commenced
39
(approximately) simultaneously.1
CPU. The mobile unit's CPU can exist in one of two states: active (A), and
sleep (S). The CPU is in the A mode when any operation is taking place in the
mobile unit, i.e., any other component is in the active state. Otherwise, the CPU
slips into S mode.
Display. The display component can exist in one of two states: on (N), and o
(F). In the N state, the display unit is actively displaying data on the screen. In
F mode, the unit displays no data, and consumes no power. We assume that the
display is in the F state upon the client's arrival in the cell. The display enters
the N state while the mobile unit is downloading the DCI from the broadcast, and
returns to the F state at the completion of the data download.
Mobile Data Card. The mobile data card (MDC) has three possible states: transmit (T), receive (R), and sleep (S). In the T mode, the unit is transmitting a message
to the server. In the R mode, the unit is reading from the broadcast. In the S mode,
the unit is neither transmitting nor receiving.
We assume that the mobile data card is in the R state upon the client's arrival in
the cell for the initial probe. The MDC enters the T state to send requests to the
server, and returns immediately to the S state upon ending transmission. It enters
the R state for index reads and data reads, and returns to the S state immediately
upon completion of the read.
We assume that each mobile unit is equipped with a wireless modem for accessing
bandwidth on air. We assume a data rate of 19.2 kbps, the rate at which clients
both receive data from the broadcast and transmit requests to the server. We base
our model of the MDC on Motorola's wireless PCMCIA modem [Inc. 1997].
9.3 Server component
To model the server, we simply implemented the broadcast generation algorithms
(i.e., the server side protocols) described earlier. We are not interested in the server
side cost model.
10. PERFORMANCE ANALYSIS
In this section, we present the results of the simulation study of our protocols described in section 6. To see how our results fare in relation to existing protocols, we
have also compared our strategies against the (1,m) protocol [Imielinski et al. 1997]
proposed by Imielinski, Vishwanathan,and Badrinath. Recall that we borrowed the
broadcast organization structure from this strategy { it thus seemed appropriate
that we compare our results against that of the (1,m) protocol (which we shall
refer to as the IVB strategy from this point on). Note in this connection that this
was not an easy task, as the IVB strategy neither decides broadcast content nor
determines client retrieval policies. Consequently, we had to impart to the IVB
strategies similar functionalities as our protocols. Specically, we had to add both
server-side as well as client side strategies to the IVB protocol. To make the comparison fair we use the same server side protocol in the IVB strategy as we use in
1 There is a very small delay introduced in relaying the information read by the mobile interface
unit to the disk writing unit.
40
our CBS strategy (recall that both are periodic strategies). In other words, both
the CBS as well as the IVB protocols end up with a virtually identical broadcast
content and organization. The only dierence is that IVB index and data bucket
headers dier from the CBS bucket headers. In the IVB strategy the bucket headers include the tuning information outlined in [Imielinski et al. 1997], whereas our
protocol includes tuning information discussed in section 5. We also had to decide
on what kind of client side strategies to use for IVB. Again, to make the comparison
fair, we decided on using our client side strategies with one dierence { in IVB in
successive broadcasts entire data clusters are downloaded, whereas in our protocol,
only the dirty buckets are downloaded (as outlined in section 6.1.2). In summary
therefore, the dierences in performance (if any) between IVB and CBS would arise
as a result of two factors:
|The \smart" tuning information we use.
|The selective downloading that we propose.
Note that both the above characteristics let our protocols selectively tune in and
out of the broadcast. Thus, by comparing the our CBS strategy with the IVB
strategy, we will determine whether this selective probing of the broadcast results
in any energy conservation eciencies. We hasten to add that this comparison will
not yield any insights into the eciencies attained as a result of our server side
broadcast generation algorithms { recall that both CBS and IVB strategies use the
same algorithms to generate broadcast content. Due to the lack of other dynamic
broadcast content generation mechanisms, we are unable to provide a comparative
study of the benets of our algorithms. However, we do compare VBS and CBS.
Recall that VBS, unlike CBS, does not have an explicit content determination
strategy { everything that is requested is included in the broadcast. Thus, this
comparison does provide some insights into the pros and cons of our server side
algorithms.
10.1 Performance Metrics
Our primary goal in this performance study is to show the ecient and energyconserving properties of our protocols. We show these properties through two performance metrics: Access Time (AT) and Normalized Energy Expenditure (NEE).
Access Time (AT):. AT measures the eciency of our protocols. We dene AT
in the same way that latency is dened in [Imielinski et al. 1997], i.e., it is the
amount of time that elapses between the point at which the client begins to look
for his DCI in the broadcast and the point at which the client nishes his next
download of that DCI. For a client who has just arrived in the cell, and has not
yet downloaded his DCI, AT refers to the time between his arrival and the end of
his rst DCI download. After the initial read, we assume that the client would like
to retrieve his DCI from every succeeding broadcast, and therefore immediately
begins the process of looking for his DCI in the next broadcast. Thus, for a client
who has completed his initial read, AT refers to the period between the end of one
DCI download and the end of his next DCI download.
Normalized Energy Expenditure (NEE):. NEE measures the energy-conserving
ability of our protocols. The basic idea behind NEE is simple { it is simply the
41
energy spent on average by a client to download a bucket of data. Consider a client
i, who spent Wi Watt-sec of energy during her stay in the cell and accessed (downloaded) her data cluster ci times. Further assume that this client's data cluster
of interest (DCI) contained bi buckets. Then her normalized energy expenditure,
NEEi , would be given by ciWibi . We are really interested in the overall NEE of
the cell. Assume that during the course of the simulation we tracked a total of L
clients. Then, to compute the overall NEE, we use the following formula:
PL
NEE = PL i=1 Wi
i=1 (ci bi )
where Wi and ci are the energy expenditure and number of downloads respectively
of client i.
Percentage of Clients Starved (PCS):. PCS is a measure of the quality of service
provided by our protocols. Recall that one of the main goals in designing our
protocols was to optimize service provided to clients interested in hot items while
ensuring that all clients, even those interested in cold items, receive their DCIs at
least once during their stays in the cell. We dene a starved client as one who
departs from the cell before reading his DCI. PCS, then, can be dened as follows:
PCS = NumClientsStarved
NumClients
Now we report the results of our experiments. We rst perform a baseline experiment with some \typical" parameter values. Subsequently we study the sensitivity
of the results to the perturbation of certain \key" parameters. Finally, we consider
the quality of service provided by our protocols.
All the curves presented in this paper exhibit mean values that have relative
half-widths about the mean of less than 10% at the 90% condence level. Each
simulation experiment was run until at least 50000 clients departed the cell. We
only discuss statistically signicant dierences in the ensuing performance reporting
section.
10.2 Baseline Experiments
As mentioned before, we investigate three protocols for client retrieval: the Constant Broadcast Size (CBS) strategy, the Variable Broadcast Size (VBS) strategy,
and the Imielinski, Vishwanathan, and Badrinath (IVB) strategy.
First, we present the baseline parameter values and some justication of some
the values chosen. Table 5 shows the selected baseline parameter values. We
note here that some parameters are only valid for certain protocols. For example,
the parameter BroadcastSize, which represents the length of the broadcast in the
periodic protocols CBS and IVB, has no meaning in the VBS protocol, in which the
broadcast size grows and shrinks with client demand. Other such examples should
be obvious.
We add a few notes to justify some of the values chosen for our parameters.
|DatabaseSize: We chose 3000 items based on the number of companies listed on
the NYSE exchange (currently 2,920 companies) [NYSE 1997].
42
Notation
DatabaseSize
MeanRecordsPerItem
StdRecordsPerItem
MinRecordsPerItem
MaxRecordsPerItem
DesiredWaitCount
PercentHot
PercentChooseHot
DataRecordSize
IndexRecordSize
DataHeaderSize
IndexHeaderSize
BucketSize
IndexSegmentsPerBroadcast
BroadcastSize
BucketsPerSecond
MeanChangeFrac
StdChangeFrac
MinChangeFrac
MaxChangeFrac
TransmissionRate
ArrivRate
MeanResidenceLatency
StdResidenceLatency
IdlePeriod
SpinUp
Baseline Value
3000 items
150 records
6 records
3 records
300 records
3 broadcasts
20%
80%
40 bytes
5 bytes
8 bytes
8 bytes
128 bytes
4
1600 seconds
18.87 seconds
0.02
0.001
0
1
19.2 kbps
variable
8000 seconds
26.67 seconds
30 seconds
2 seconds
Table 5. Broadcast and Cell Parameters used in the Simulation
|TransmissionRate: According to Motorola product specications, 19.2 kbps is
currently the highest data rate available for wireless modems [Inc. 1997].
|IdlePeriod, SpinUp: These disk parameters were chosen based on product specications for Seagate disk drives for laptops [Seagate Technology 1997].
Before we present the baseline results we would like to acquaint the reader with
the general outline of the ensuing discussion. We rst discuss the normalized energy
expenditure (NEE) metric and present the energy consumption characteristics of
the dierent protocols. In discussing the NEE metric, we rst compare the CBS
and IVB protocols and then compare the CBS and VBS protocols. Why did we
make this separation, rather than discuss all three together? As discussed before,
the dierence between CBS and IVB is the selective tuning on the part of the CBS
clients. However, the server side protocols are the same. Conversely the dierence
between the CBS and VBS protocols is in the server side strategies { the client
side protocols are the same. In other words, the comparison between CBS and IVB
highlights the eciency of our client protocols, while the comparison between CBS
and VBS illuminates the dierences between our server end algorithms. In order
to showcase this dichotomy, we have decided to separate the presentation of the
results.
43
Normalized Energy Expenditure (power consumption/bucket cached)
Having discussed the NEE features, we turn our attention to the access time
(AT) curves. In discussing the AT curves we follow a format of presentation similar
to that of the NEE curves, i.e., we separate the presentation of the CBS vs. IVB
protocol from the discussion of the CBS vs. VBS protocols.
84
CBS Base hot
IVB Base hot
82
80
78
76
74
0.05
0.1
0.15
Arrival rate (clients/second)
0.2
0.25
0.2
0.25
[A]
260
240
CBS Base cold
IVB Base cold
220
200
180
160
140
120
100
80
0.05
0.1
0.15
[B]
Fig. 8. Experimental Results: [A] NEE curves for CBS and IVB hot clients; [B] NEE curves for
CBS and IVB cold clients
10.2.1 Comparing Normalized Energy Expenditure for CBS and IVB. We rst
present how the CBS and IVB protocols fare with respect to energy consumption.
Recall that in our model, a client has a specic data item that he is interested
in, and that this data item may be a hot item or a cold item. Our intuition was
that the type of data item that a client was interested in would markedly impact
the client's performance. This intuition was borne out by the experimental results.
Accordingly, we present the NEE results of hot and cold clients in gures 8A and
B respectively.
44
NEE hot curves for CBS and IVB. We rst discuss the hot NEE curves shown
in gure 8A. Notice that both the CBS and IVB curves have the same basic shape
{ (a) a short period of constant NEE (below an arrival rate of :075 clients/second),
followed by (b) a gently rising NEE curve (between arrival rates of :075 and 0:2)
and, nally, (c) a sharply rising trend in NEE values (beyond arrival rates of 0:2
clients/sec). We now explain this shape.
|At low arrival rates, the demand is not high enough to completely ll the broadcast, thus, every requested item can t into the broadcast at these rates. Moreover, dead air is inserted into whatever unlled broadcast period may remain
after insertion of all requested data items. Thus at these rates, all clients have
exactly similar access and wait patterns, explaining the at nature of the NEE
curve at these loads.
|At a certain load, the broadcast becomes full (this occurred at approximately
the :075 clients/sec load in our experiments). After this point there is contention
among the requested items for inclusion. Our broadcast period was chosen such
that about 95% of the hot items in the database can t into the broadcast simultaneously. Thus there is a very small \bumping" of hot items by other hot
items at these rates. More importantly, at these loads, cold items get seriously
bumped { thus Ignore Factor (IF) comes into play (Recall our priority computation scheme described in section 5). Because of this factor, occasionally hot
items do get bumped by cold items, but this does not happen very often. The
net eect is that at these loads, hot clients are likely, but no longer guaranteed,
to nd their items in every broadcast. This has an eect of a mild increase in
waiting times for items. Even while waiting however, a client spends energy (see
the sleep mode power consumption rates in table 4). As load increases, potentially fewer and fewer buckets get downloaded as clients are increasingly likely
to nd their items dropped from broadcasts, for reasons outlined above. Thus,
energy consumed per bucket increases, explaining the gently rising trend of NEE
for these loads.
|As mentioned in the previous items, with increasing load, more and more clients
request cold (and hot) items. Hot items are included much more often in the
broadcast than cold items (though some cold items occasionally make it into the
broadcast). Thus, as load increases, requests for cold items are denied for longer
and longer periods of times. Eventually, the load reaches a point where there
are signicant requests for cold items which are denied for periods greater than
the Quality of Service measure of the system termed DesiredWaitCount (DWC)
(explained previously in section 6). When this occurs, our priority computation
procedure starts awarding high priorities to these cold items, as the Adaptive
Scaling Factor values increase, which then start bumping a large number of hot
items from the broadcast. This causes hot clients to wait much longer to download their items, causing a large increase in NEE. Moreover, this phenomenon
is exacerbated as load levels increase even further, explaining the sharply rising
trend of the NEE curve.
Having explained the overall shape of the NEE hot curves, we compare the CBS
NEE hot curve to the IVB NEE hot curve. The CBS protocol outperforms the
IVB protocol at all load levels. This represents the energy savings obtained from
45
the use of the smart cooperative tuning protocols of the client and the server. In
other words, the selective tuning policies we advocate do make sense and the clients
retrieving hot items from the database save approximately 1.6 Joules per bucket
downloaded. This is non-trivial. For instance, consider that most high-end laptops
are powered by Lithium Ion batteries. A Lithium Ion cell outputs 3.7 Volts for
1.250 Ampere-hours [Corporation 1997]. This translates to 16650 Joules of stored
energy of one Lithium Ion cell. Now, let us make the conservative assumption
that the four components included in our cost model account for 80% of the total
energy consumption of a mobile unit, approximately 13320 Joules of stored energy.
This translates to a savings of approximately 400 Joules (approximately 2.5% of
the capacity of the Lithium Ion cell) over the course of a CBS client's residence
in the cell. This savings allows CBS clients, on average, to monitor the broadcast,
i.e., maintain power, through 2100 more buckets of broadcast than the average IVB
client. This translates into 268800 additional bytes of data.
NEE cold curves for CBS and IVB. We move on to describe the CBS and IVB
NEE cold curves, shown in gure 8B. As with the hot curves, we begin by describing
the general trend of the curves. At low loads, at which client demand does not ll
the broadcast, NEE cold is exactly similar to NEE hot. This constant trend, in our
experiments, is visible at loads less than 0:075 clients/sec.
As load increases, the broadcast lls up, and hot items begin to bump cold
items from the broadcast, causing an increase in the slope of NEE as some clients
requesting cold items are forced to wait more then one broadcast period for their
DCIs to (re)appear. This eect gets exacerbated with increasing load, explaining
the rapidly increasing trend exhibited by the NEE curves between loads of :075 and
0:2.
As load increases further, clients who are interested in cold items wait for periods
longer than the DWC designated by the system (this is explained in detail in the hot
NEE discussion above), and cold items are awarded high priorities. This forces their
inclusion into the broadcast at the expense of hot items. In the hot NEE curves,
this caused a sharp increase in the NEE values. For exactly the same reason, this
causes a decrease in the slope of the cold NEE curve as the availability of cold items
increases.
We now compare the CBS and IVB cold NEE curves to one another. At low
loads, there is a small (barely perceptible) dierence between CBS NEE cold and
IVB NEE cold, where CBS outperforms IVB. At this level of client demand, the
hot and cold items requested by clients t into the xed period of the broadcast.
CBS cold clients are able to read from consecutive broadcasts, realizing some energy
savings from the smart tuning protocols. Once hot items begin to bump cold items
from the broadcast, however, very few clients requesting cold items are able to read
their DCIs from consecutive broadcasts. As remarked earlier, unless clients can
retrieve from successive broadcasts, our method assumes that all data has changed
and forces the download of the entire data cluster { which is exactly the same as
what the IVB protocol does. Thus, for higher loads, NEE cold for CBS and IVB
are equal.
Summary. To summarize, it may be said that the smart tuning aspects of our protocols make sense if there is some locality of data access patterns, i.e., a \hot"class
46
of data items exists. If such access patterns do not exist, selective tuning does not
have an appreciable impact.
120
CBS Base hot
VBS Base hot
110
100
90
80
70
60
50
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
250
CBS Base cold
VBS Base cold
200
150
100
50
0.05
0.1
0.15
[B]
Fig. 9. Experimental Results: [A] NEE curves for CBS and VBS hot clients; [B] NEE curves for
CBS and VBS cold clients
10.2.2 Comparing Normalized Energy Expenditure for CBS and VBS. Having
discussed the dierence between the CBS and IVB protocols, we turn our attention
to highlighting the dierence between the CBS and VBS protocols. Again we
separate the discussion of the hot and cold clients. Figures 9A and B present the
NEE curves for hot and cold clients respectively.
NEE hot curves for CBS and VBS. Let us rst consider the hot curves shown
in gure 9A. The rst item of interest is the fact that the CBS and VBS curves
have very dierent shapes. This leads us to believe that the server side protocols
determine the shape of the curves (note that the CBS and IVB curves, with identical
server side protocols, have the same shapes). Let us, therefore, attempt to explain
the shapes of the CBS and VBS hot NEE curves.
47
The shape of the CBS NEE hot curve has been described in the CBS vs. IVB
section above (though it looks dierent because the gures 8A and 9A are scaled
dierently) and we concentrate on the VBS curve. The VBS NEE hot curve is
characterized by two features: (a) as load increases NEE increases, and (b) the
increase is higher at lower loads, i.e., the slope of the NEE curve decreases with
increasing load. Both of these features are easily explained:
|It is easily seen that NEE is closely related to broadcast size { as the broadcast
size increases, clients have to wait longer download their items. However, as
mentioned before, even while in sleep mode, clients continue to expend energy.
As broadcast size increases, for the same amount of residence, fewer data clusters
are downloaded (as waiting time increases). Thus, NEE continues to increase.
|Now we explain the decreasing slope of the NEE curve. At low loads, any new
request is likely to be a request for an item not already in the broadcast. Recall
that VBS will include every requested item in the broadcast. This signies that
at low loads, virtually every new request will result in an increase in broadcast
size. This explains the (comparatively) higher slope at low loads. As load increases, it becomes more likely that clients will request items that are already
in the broadcast. Thus, at these loads, not every request results in an increase
in broadcast size. This results in the decreasing slope of the NEE curve with
increasing load. In fact, (though this is not shown in the gure) if the load is
made high enough, the VBS NEE curve virtually attens out.
Having explained the shape of the curves, we now compare the NEE hot curves for
VBS and CBS. At low loads, VBS outperforms CBS, where client demand is not
high enough to ll the CBS broadcast size. At these loads, the CBS broadcast lls
the remainder of the broadcast with dead air, while the VBS broadcast adapts its
size to the lower client demand. Thus, at low loads, VBS provides data to clients
at lower energy expenditure per download than CBS.
At the load at which clients request just enough items to ll the CBS broadcast,
VBS and CBS perform identically. The VBS NEE hot curve crosses the CBS NEE
hot curve at this load, since the size and content of the VBS broadcast is the same
as that of the CBS broadcast. This occurred, in our experiments at approximately
:075 clients/sec as shown in gure 9A.
At higher loads, CBS NEE hot outperforms VBS NEE hot. Here, the CBS
protocol responds to the larger number of client requests by prioritizing items for
inclusion in the broadcast, and ignoring some items, causing relatively stable NEE
hot values. The VBS protocol, however, expands to include all requested items,
causing higher NEE values.
NEE cold curves for CBS and VBS. We now describe the NEE cold curves for
CBS and VBS, shown in gure 9B. As usual, we rst consider the shapes. The CBS
NEE cold curve is the same as that shown in gure 8B and described earlier. The
VBS NEE cold curve is exactly the same as the VBS NEE hot curve, as the VBS
protocol does not distinguish between hot and cold items { everything is included
in the broadcast.
We now compare the performance of VBS and CBS for energy consumption for
clients requesting cold items. These curves, shown in gure 9B, are particularly
48
interesting as VBS NEE cold outperforms CBS NEE cold at virtually all loads.
At low loads, the VBS protocol adapts to low client demand, while the CBS protocol transmits dead air if not enough requests are received to ll the broadcast.
Eventually there comes the (by now well known) point where the broadcast sizes
are identical. At this this point, the VBS and CBS curves meet. From this point
to a load level of 0.04 clients/sec, CBS and VBS perform virtually identically, as
the VBS broadcast size grows and the CBS broadcast begins bumping some items.
This situation however, does not last. With increasing load, more and more cold
items get bumped from the CBS broadcast and client demands for cold items are
increasingly left unsatised. In the VBS broadcast on the other hand, while the
broadcast size increases, clients are guaranteed to nd their items in every broadcast. Eventually there comes a point where the cold clients are starved longer in
the CBS protocol than they would have to wait through the longer broadcast in the
VBS protocol. Beyond this load point, shown as :11 clients/sec in our experiments,
VBS exhibited a lower NEE than CBS.
Summary. In summary, for hot clients, CBS outperforms VBS at all but very
low loads, i.e., where the CBS broadcast is not full. Here, the VBS dominates. For
cold clients, for all load levels, the VBS protocol either outperforms or performs
identically to the CBS protocol. This makes sense, since the CBS protocol is
optimized to provide better service to clients interested in more popular items
at the expense of service for cold items.
10.2.3 Comparison of Access Times. Having presented the NEE results for the
baseline case, we now turn our attention to analyzing the other metric, namely
Access Time (AT). In presenting AT results, we follow an identical organization
to the one adopted to present the NEE results. But rst we make an important
observation that applies to all the AT results presented here. It turns out that the
AT curves have a similar shape to the NEE curves and the relative positions of
these curves in relation to each other is the same as those in the NEE case. In
other words, the AT results exhibit trends very similar to the NEE results. In most
of the AT results presented in the literature (e.g., [Imielinski et al. 1997]), AT and
tuning time (TT) show quite dierent forms, and TT has been the measure that
has been traditionally used to demonstrate energy eciency. In this paper however,
instead of using TT as a measure of energy eciency, we have used actual energy
consumption cost model, which, needless to say, adds signicant accuracy to our
model. Moreover, it goes to show that TT is not an accurate predictor of energy
consumption. We provide rationale for the above observations below.
First of all, let us explore why the AT and NEE curves exhibit similar trends.
Clearly, AT depends upon how long clients have to wait to access their data items {
since AT is in units of time, an additional unit of time added to the waiting period
of a client adds an additional unit of time to its AT. The computation of NEE
however, is not so simple. Simply put, clients spend energy in two ways: (a) they
spend more energy in the active mode, and (b) they spend a constant amount of low
energy when in the wait, or sleep mode (i.e., when all four components are asleep).
It is easily seen then, that if a client were perpetually asleep (or waiting), its energy
consumption would be directly proportional to time (i.e., of the form k t where k
represents constant energy consumption per unit time in the sleep mode). In other
49
words, during the time when a client is asleep its NEE curves will have exactly
the same shape as AT curves (but scaled by the k factor, of course). The issue
however, is that a client is not always asleep { rather, its wakes up occasionally to
download data items. Moreover, dierent components wake up at dierent times,
e.g., the MDC wakes up to transmit a request when a client arrives at a cell, or
to download from a broadcast; the disk wakes up to write data to the cache (the
disk however may be asleep when the MDC is transmitting), and so on. In other
words, while the system is awake (which occurs when any one or more of the four
components are awake) there is variable power dissipature, i.e., non-uniform energy
consumption. Clearly, therefore, if the system were awake continuously, we would
observe a NEE curve quite dierent from the AT curve. Why then do we see such
a similarity between the AT and NEE curves? The reason is because the system
is sleeping (i.e., all components are asleep) most of the time. In our experiments,
based on the parameter values chosen, clients, on average were sleeping through
90% of their residence in the cell (this of course ignores any internal computation
that a client engages in { this has no bearing on our problem). This also makes
perfect intuitive sense. After all, a typical client is often interested in a very small
subset of all the information being broadcast. Therefore, if smart strategies were
utilized (as we have demonstrated) the client could be kept in sleep mode most of
the time. Thus, the sleep time power consumption dynamics greatly dominate the
overall energy expenditure gures. As discussed above, when a client is sleeping its
NEE curve is simply a scaled up version of the AT curve. This explains why we see
such marked similarities of form between the two curves.
This discussion also explains why tuning time (TT) is not an accurate measure of
energy expenditure. Clearly, since the \tuned-out" times greatly dominate \tunedin" times, TT, used by itself, mis-characterizes the nature of power dissipature in
mobile clients. Next, we actually show the AT curves that we generated, using the
same organization as the NEE graphs.
Important note: We would like to draw the reader's attention to the fact that
the shape of the VBS curve is identical to the shape of the VBS AT curve in our
analytical study, shown in gure 5B. We note that the shape of our CBS AT curve,
however, does not match that of our analytical study, also shown in gure 5B. Our
analytical CBS curve is at, but our empirical CBS curve has a rising slope. This
is easily explained. In our analytical study, we disregarded the notion of Ignore
Factor in order to make our study mathematically tractable, while our empirical
study considers the eects of Ignore Factor. This accounts for the rising slope of
the CBS curve produced in our simulation study.
Access Times for CBS and IVB Protocols. The access times for the CBS and
IVB protocols corresponding to hot and cold clients are shown in gures 10A and B
respectively. The shapes are the same as the corresponding NEE curves (gures 8A
and B) for reasons explained above. The noteworthy feature of these gures is that
the CBS and IVB curves are identical. However, this is expected as the basic CBS
and IVB server protocols are identical. The only dierence is smart tuning, which
has no eect on access times because it is part of the client-side protocol.
Access Times for CBS and VBS Protocols. The access times for the CBS and
VBS protocols corresponding to hot and cold clients are shown in gures 11A and B
50
1680
CBS Base hot
IVB Base hot
Access Time (seconds)
1660
1640
1620
1600
1580
1560
1540
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
5000
CBS Base cold
IVB Base cold
4500
4000
3500
3000
2500
2000
0.05
0.1
0.15
[B]
Fig. 10. Experimental Results: [A] AT curves for CBS and IVB hot clients; [B] AT curves for
respectively. The shapes are the same as the corresponding NEE curves (gures 9A
and B) for reasons explained above. For reasons of brevity we do not discuss these
curves any further.
10.3 Eect of varying the size of the broadcast
Having discussed our baseline results, we now turn our attention to presenting
the results of some sensitivity analyses. We rst discuss the eects of changing
broadcast size. Clearly, this discussion is only valid for the CBS protocol, as the
VBS strategy does not have a xed broadcast size. Size is an important parameter
as it impacts, simultaneously, two important features of the broadcast: (a) wait
times of clients { the longer the broadcast, the longer, on average, a client must
wait, even if his item of interest is included in every broadcast, and (b) inclusion
contention for items { the shorter the broadcast, the greater the number of items
that must be left out.
From this point on in this paper, we no longer report AT results. This saves space,
51
2400
CBS Base hot
VBS Base hot
2200
2000
1800
1600
1400
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
5000
CBS Base cold
VBS Base cold
4500
4000
3500
3000
2500
2000
1500
0.05
0.1
0.15
[B]
Fig. 11. Experimental Results: [A] AT curves for CBS and VBS hot clients; [B] AT curves for
and does not take away from quality of presentation as the AT curves are quite
similar to the NEE curves. The NEE results for these experiments corresponding
to hot and cold clients are presented in gures 12A and B respectively.
NEE hot curves for varying the size of the broadcast. First, we discuss the NEE
hot curves for varying broadcast sizes, shown in gure 12A. We note that the curve
for the broadcast size of 1590 seconds is the same as the CBS NEE hot baseline curve
discussed earlier in connection with gure 8A. We further remind the reader that
the size of the baseline broadcast was chosen such that it accommodates virtually
all of the hot items in the database. This will lead to an important conclusion in
the end.
A larger broadcast size of 2120 seconds produces a much atter curve for NEE
hot than the baseline curve. Since the broadcast size is larger, more items can be
included in it. This leads to less contention for broadcast space, and fewer ignored
items per broadcast. Thus, fewer clients are starved through multiple broadcasts
52
140
130
CBS hot, Broadcast Size = 1060
120
110
100
90
80
70
60
0.05
0.1
0.15
Arrival Rate (clients/second)
0.2
0.25
0.2
0.25
[A]
350
CBS cold, Broadcast Size = 1060
300
250
200
150
100
0.05
0.1
0.15
[B]
Fig. 12. Experimental Results: [A] NEE curves for CBS hot clients for varying BroadcastSize
values; [B] NEE curves for CBS cold clients for varying BroadcastSize values
in order to download their DCIs, leading to a lower slope then the baseline curve.
However, clients are forced to wait longer because of the increased broadcast size.
This results in increased power consumption and, consequently, increased energy
expenditure. Thus the curve for 2120 seconds is over the curve for 1590 seconds,
i.e., the overall NEE hot values for the larger broadcast size are higher than those of
the lower-sized broadcast. Important Note: It turns out that at an arrival rate of
1.5 clients/sec the 1590-second curve crosses, and then overtakes, the 2120-second
curve. This is not shown in the gure, as average arrival rates this high are unlikely
in real applications.
We experimented with a smaller broadcast size of 1060 seconds { this is shown
in gure 12B. A smaller broadcast size of 1060 seconds can accommodate fewer
items than the baseline. Items are therefore bumped more frequently, causing more
clients to wait through multiple broadcasts for their DCIs than in the baseline
case. Because items are bumped more frequently, the slope of the NEE curve for
the smaller broadcast size is greater than that of the baseline case.
53
Next we compare the NEE hot values for the 1060-second curve and the baseline
curve. At low arrival rates, few clients nd their items ignored in either broadcast
size. Therefore the lower sized broadcast shows better NEE than the baseline case
(for a similar reason the baseline VBS hot NEE curve was below the CBS baseline
hot NEE curve { see discussion pertaining to gure 9A). However, with increasing
load, the lower sized broadcast faces high item inclusion contention, resulting in a
steep increase in NEE (shown around 0:16 clients/sec in our gure). Finally, the
baseline curve shows better overall NEE values starting at 0:18 clients/sec.
NEE cold curves for varying the size of the broadcast. The cold curves, shown
in gure 12B, are as expected { the 2120-second curve has a smaller slope and the
1060-second curve has a higher slope than the baseline (1590-second) curve. The
only dierence with the hot curves is that the crossover points dier. The lowest size
curve rapidly exhibits the worst NEE, which is expected as the cold items are rapidly
bumped by the hot items owing to the small size of the broadcast. Moreover the
baseline curve exhibits higher NEE gures than the highest sized curve at moderate
arrival rates. This is also expected as the largest broadcast allows more cold items
to remain in the broadcast, resulting in reduced client starvation.
Summary. In summary, the following conclusions can be drawn from this experiment. If access locality is not present, a small broadcast size is strictly non-optimal.
However, in the more realistic case where access locality is indeed present, the two
important determinants of broadcast size are (a) the size required to accommodate
the hot items in the database, and (b) expected system load. If system load is
expected to be low, a good rule of thumb to follow is to choose a broadcast size
smaller than that required to accommodate all the hot items. In our experiments
we found that a broadcast that can accommodate approximately 80% of all hot
items works fairly well for low load systems. However if system load is expected
to be moderate to high, it is near-optimal to choose a size that can accommodate
all the hot clusters. One important (and somewhat counter-intuitive) lesson is that
larger sizes have very little marginal benet. The smaller inclusion contentions
realized as a result of larger sizes are more than outweighed by the longer wait
times.
10.4 Eect of Varying Client Mobility
Next we study the eects of varying client mobility, i.e., we study what happens
if clients, on average, stay for a longer or shorter duration of time in cells. This is
achieved by varying the Residence Latency (RL) characteristic of our model. We
study both the CBS and VBS algorithms in isolation and then provide a comparative discussion.
10.4.1 CBS NEE curves for varying client mobility
NEE hot curves by varying the MeanResidenceLatency parameter. First, we describe the NEE hot curves for varying the average client's length of residence in
the cell, shown in gure 13A. We note that the curve for NEE hot for a MeanResidenceLatency value of 7950 seconds is the same as the baseline CBS NEE hot
curve, discussed in connection with gure 8A.
We rst make the observation that changing RL has the eect of altering the
Normalized Energy Expenditure (energy consumption/bucket cached)
54
105
CBS hot, RL = 5300
CBS hot, RL = 7950
CBS hot, RL = 10600
100
95
90
85
80
75
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
300
CBS cold, RL = 5300
CBS cold, RL = 7950
CBS cold, RL = 10600
250
200
150
100
0.05
0.1
0.15
[B]
Fig. 13. Experimental Results: [A] NEE curves for CBS hot clients for varying MeanResidenceLatency values; [B] NEE curves for CBS cold clients for varying MeanResidenceLatency values
client load on the system. Clearly, given a specic arrival rate, if RL is increased,
the client density in the cell increases as clients stay longer. Similarly, decreasing
RL has the eect of reducing system load.
For a longer average stay in the cell of 10600 seconds, the NEE curve has a larger
slope than the baseline curve. Here, the load on the system is greater than that of
the baseline case. Clients remain in the cell for a longer period of time, increasing
the number of items requested. This increases contention for space in the broadcast.
More clients are forced to wait through multiple broadcast periods to retrieve their
DCIs, leading to higher average NEE hot values than the baseline NEE hot values.
For a shorter average stay in the cell of 5300 seconds, the NEE hot curve has a
smaller slope than the baseline curve. A smaller number of clients in the produces
a smaller number of requested items, less contention for space in the broadcast, and
shorter overall waits for clients to retrieve their DCIs. This leads to lower overall
NEE hot values and a atter curve than in the baseline case.
55
NEE cold curves for varying the MeanResidenceLatency parameter. We now describe the NEE cold curves, shown in gure 13B. We note that the curve for NEE
cold for a RL value of 7950 seconds is the same as the baseline NEE cold CBS
curve, discussed in connection with gure 8B.
A larger RL of 10600 seconds produces a curve similar in shape to the baseline
curve, but with a larger slope. Like the hot NEE curve for the larger broadcast
size, the longer client stay in the cell leads to a larger number of clients in the cell.
This, in turn, leads to a larger number of requested items and more contention
for space in the broadcast. Clients must, on average, wait longer for their DCIs,
causing both larger absolute NEE cold values and a larger slope than the baseline
case.
A smaller RL of 5300 seconds leads to a smaller number of clients in the cell,
and less contention for space in the broadcast. Fewer items are excluded from the
broadcast than in the baseline case, so fewer clients wait through multiple broadcast
periods for their DCIs. This leads to lower overall NEE cold values and a smaller
slope to the NEE cold curve than in the baseline case.
140
VBS, RL = 5300
VBS, RL = 7950
VBS, RL = 10600
120
100
80
60
40
0.05
0.1
0.15
0.2
0.25
Fig. 14. Experimental Results: NEE curves for VBS clients for varying MeanResidenceLatency
values
10.4.2 VBS NEE curves for varying client mobility
NEE curves for varying the MeanResidenceLatency parameter. First, we note
that the VBS protocol does not distinguish between hot and cold items, i.e., all
items are included in the broadcast. This results in cold curves that are exactly
similar to the hot curves. Therefore, we do not separate the presentation of hot
and cold VBS NEE curves.
Next, we describe the NEE curves for varying the length of clients' residence in
the cell for the VBS protocol, shown in gure 14. The curve for a RL of 7950
seconds is the same as the baseline NEE VBS curve, discussed in connection with
gure 9A.
A larger RL value of 10600 seconds produces a curve similar in shape and slope
to the baseline curve, but with a higher absolute NEE value. Longer client stays
56
lead to a larger number of clients, and therefore more requested items. Since the
size of the VBS broadcast increases to adapt to a larger demand, the larger RL
value leads to higher NEE values than in the baseline case.
A smaller RL value of 5300 seconds also produces a curve similar in shape and
slope to the baseline curve, but with lower absolute NEE values. Shorter client
stays lead to a smaller number of clients and requested items. The VBS broadcast
shrinks to accommodate the lower load on the system, which leads to shorter waits
to download data. This leads to lower NEE values than in the baseline case.
A Comparative Study of CBS and VBS for varying client mobility. We rst consider the curves for CBS and VBS NEE hot, shown in gure 15A. As in the baseline
case, VBS hot outperforms CBS hot at low load levels and CBS hot outperforms
VBS hot at higher load levels for all variations of RL. The main dierence is the
point of crossover. For a higher client density, i.e., higher MeanResidenceLatency
than baseline, the crossover point occurs at a lower load level. For a lower client
density, the VBS curve crosses the CBS curve at a higher load than in the baseline
case.
The NEE cold curves in gure 15B are similar to the baseline curves for all
variations of RL in that VBS performs equally as well as or better than CBS.
Summary. In summary, our experiments with varying the MeanResidenceLatency
parameter show that performance degrades as the average length of a client's residence, and therefore client density in the cell, increases. If access locality is present
in the data set, VBS outperforms CBS at low loads, while CBS predominates at
high loads. If there is no access locality, VBS outperforms CBS at all load levels.
Finally, we add a note that RL is not a tunable parameter, but is instead based on
the geographic and demographic properties of the cell.
10.5 Eect of varying the number of items in the database
We now turn our attention to the eects of changing the number of items in the
database. This is achieved by varying the DatabaseSize parameter in our model.
We note that changing the number of items in the database changes the inclusion
contention in the broadcast. A larger database increases the size of the hot set
of items, which in turn increases contention for inclusion in the broadcast. Fewer
items in the database, similarly, decreases contention for inclusion in the broadcast.
10.5.1 CBS curves for varying the number of items in the database
NEE hot for varying the number of items in the database. First, we describe the
CBS NEE hot curves for varying database sizes, shown in gure 16A. We note that
the curve for a database size of 3000 items is the same as the baseline CBS NEE
hot curve, discussed earlier in connection with gure 8A.
A larger database size of 4500 items produces a curve with a much larger slope
than that of the baseline curve. In this scenario, clients have a wider choice of data
items, causing an increase in the overall number of requested items. This causes
increased contention for space in the broadcast and longer waits for data items as
some items get bumped from the broadcast. This leads to larger NEE values at all
load levels than in the baseline case.
The slope of the curve for a smaller database size of 1500 items is much atter
140
120
57
CBS hot, RL = 5300
CBS hot, RL = 7950
CBS hot, RL = 10600
VBS hot, RL = 5300
VBS hot, RL = 7950
VBS hot, RL = 10600
100
80
60
40
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
300
250
CBS cold, RL = 5300
CBS cold, RL = 7950
CBS cold, RL = 10600
VBS cold, RL = 5300
VBS cold, RL = 7950
VBS cold, RL = 10600
200
150
100
50
0.05
0.1
0.15
[B]
Fig. 15. Experimental Results: [A] NEE curves for CBS and VBS hot clients for varying MeanResidenceLatency values; [B] NEE curves for CBS and VBS cold clients for varying MeanResidenceLatency values
than the baseline curve. The smaller choice of items leads to a smaller number of
requested items and less contention for broadcast space. Fewer clients must wait
through multiple broadcasts to retrieve their DCIs, which is manifested in lower
NEE values than in the baseline case.
NEE cold for varying the number of items in the database. We now discuss the
CBS NEE cold curves for varying the size of the database, shown in gure 16B. We
note that the curve for a database size of 3000 items is the same as the baseline
CBS NEE cold curve, shown in gure 8A.
The curve for a larger database size of 4500 items has a shape similar to that
of the baseline curve, but a larger slope. The larger number of items vying for
inclusion in the broadcast forces some items to be bumped, i.e., some clients are
forced to wait for longer than one broadcast period to retrieve their items. This
leads to higher NEE cold values than the baseline case.
58
105
CBS hot, Database Size = 1500
100
95
90
85
80
75
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
CBS cold, Database Size = 1500
250
200
150
100
0.05
0.1
0.15
[B]
Fig. 16. Experimental Results: [A] NEE curves for CBS hot clients for varying DatabaseSize
values; [B] NEE curves for CBS cold clients for varying DatabaseSize values
A smaller broadcast size of 1500 items has a similar shape to that of the baseline
curve, but a much atter curve. A narrower choice of items leads to a smaller list
of requested items, and less contention for broadcast space. This, in turn, results
in fewer clients having to wait longer than one broadcast period to retrieve their
items, which produces lower average NEE cold values than the baseline case.
10.5.2 VBS curves for varying the number of items in the database
NEE for varying the number of items in the database. First, we reind the reader
that VBS does not distinguish between hot and cold items when determining broadcast content, leading to cold curves that are exactly similar to the hot curves. For
this reason, we do not separate the presentation of hot and cold VBS NEE curves.
Next, we describe the VBS NEE curves for varying database sizes, shown in
gure 17. We note that the curve for a database size of 3000 items is the same as
the VBS NEE baseline curve, shown in gure 9A.
The curve for a larger broadcast of 4500 items has a shape similar to that of the
140
59
VBS, Database Size = 1500
120
100
80
60
0.05
0.1
0.15
0.2
0.25
Fig. 17. Experimental Results: NEE curves for VBS clients for varying DatabaseSize values
baseline curve, but a larger slope and higher NEE values. The larger choice of items
leads to a larger list of requested items for inclusion in the broadcast. Since the
VBS broadcast size adapts to the larger number of requested items, clients must
wait longer to retrieve their DCIs. Longer waits lead to higher NEE values than in
the baseline case.
The smaller database size of 1500 items produces a curve with the same general
shape as the baseline curve, but with a smaller slope and lower NEE values. The
narrower choice of items leads to smaller numbers of requested items, and smaller
broadcast sizes. Client waits for DCIs are shorter than in the baseline case, which
is manifested in lower NEE values then in the baseline case.
A Comparative Study of CBS and VBS. We rst compare the CBS and VBS
NEE hot curves, shown in gure 18A. We note that the trends are similar to those
of the baseline case, where VBS performs better than CBS at low loads, and CBS
outperforms VBS at higher loads. In addition, we see that larger database sizes,
due to higher contention for broadcast space, show the VBS curve crossing the CBS
curve at lower load than in the baseline case, while the curve for a smaller database
size crosses the CBS curve at a higher load than baseline. We also note that the
dierence between the CBS and VBS NEE values is larger than the dierence
between baseline values for larger database sizes, and smaller for smaller database
sizes.
We now consider the CBS and VBS NEE cold curves shown in gure 18B. As in
the baseline case, the VBS protocol performs equally as well or better than CBS
for all variations of database size. We also note that the dierence between CBS
and VBS NEE values is larger than the dierence in the baseline case for larger
database sizes, and smaller for smaller database sizes.
Summary. Our experiments show that our protocols perform better for smaller
database sizes than for larger databases. Intuitively, this makes sense, since a larger
list of items to be broadcast will either cause contention for broadcast space (in
the CBS protocol) or a longer broadcast (in the VBS protocol). If access locality
60
140
VBS hot, Database Size = 1500
120
100
80
60
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
VBS cold, Database Size = 1500
250 VBS cold, Database Size = 3000
VBS cold, Database Size = 4500
200
150
100
50
0.05
0.1
0.15
[B]
Fig. 18. Experimental Results: [A] NEE curves for CBS and VBS hot clients for varying DatabaseSize values [B] NEE curves for CBS and VBS cold clients for varying DatabaseSize values
is present in the data set, VBS outperforms CBS at low loads on the system,
while CBS outperforms VBS at higher loads. If no access locality is present, VBS
outperforms CBS at all load levels.
10.6 Eect of varying the update rate
We now consider the eects of varying the update rate, i.e., varying the MeanChangeFrac parameter in our model. We consider the CBS and VBS protocols
separately, then provide a comparative discussion.
10.6.1 CBS curves for varying the update rate
NEE hot for varying the update rate. First, we describe the CBS NEE hot curves
for varying the update fraction, shown in gure 19A. We note that the curve for an
update fraction of 0.02 updates per item per second is the same as the the baseline
CBS NEE hot curve, shown in gure 8A.
A larger update fraction of 0.025 produces a curve with a shape similar to that
61
82
CBS, MeanChangeFrac = 0.01
80
78
76
74
72
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
260
240
220
CBS cold, MeanChangeFrac = 0.01
200
180
160
140
120
100
80
0.05
0.1
0.15
[B]
Fig. 19. Experimental Results: [A] NEE curves for CBS hot clients for varying MeanChangeFrac
values; [B] NEE curves for CBS cold clients for varying MeanChangeFrac values
of the baseline case, but with higher overall NEE values and a smaller slope. More
updates to an item leads to more dirty buckets for a given item. Clients reading
from consecutive broadcasts must, therefore, read more buckets per broadcast than
in the baseline case.
As load increases, fewer clients are able to read from consecutive broadcasts,
since items are bumped more frequently as load increases. These clients must wait
longer than one broadcast period to retrieve their DCIs, and must assume that all
buckets are dirty since they did not nd their items in the previous broadcast.
A smaller update fraction of 0.01 produces a curve with a shape similar to that
of the baseline curve, but with lower overall NEE values and a larger slope. At low
loads, many clients are able to retrieve their DCIs from consecutive broadcasts, and
are therefore able to achieve the energy savings of using the dirty-read algorithm.
As load increases, fewer clients nd their DCIs in consecutive broadcasts, causing
the slope of the curve to increase at higher loads.
62
NEE cold varying the update rate. The CBS NEE cold curves, shown in gure 19B, are very similar to the baseline NEE cold curves. We note that the curve
for an update fraction of 0.02 is the same as the baseline CBS NEE cold curve.
At low loads, there is some dierence between varying update fractions; a higher
update fraction of 0.025 produces a higher NEE, and a lower update fraction produces a lower NEE. As load increases, the curves increasingly converge, since most
clients are unable to retrieve their DCIs from consecutive broadcasts and must read
the entire DCI when it does appear.
120
VBS, MeanChangeFrac = 0.01
110
100
90
80
70
60
0.05
0.1
0.15
0.2
0.25
Fig. 20. Experimental Results: NEE curves for VBS clients for varying MeanChangeFrac values
10.6.2 VBS curves for varying the update rate
NEE for varying the update rate. First, we note that we show only one VBS
NEE curve here because the VBS protocol does not distinguish between hot and
cold items in deciding on broadcast content, producing cold curves that are exactly
similar to the hot curves. Therefore, we do not separate the presentation of hot
and cold VBS NEE curves here.
Next, we describe the VBS NEE curve for varying update rates, shown in gure 20. We note that the curve for an update rate of 0.02 updates per item per
second is the same as the VBS NEE baseline curve, shown in gure 9A.
A larger update rate of 0.025 updates per item per second produces a curve
similar in shape and slope to the baseline curve. At low loads, the curve for the
larger update rate is slightly higher than the baseline curve. Here, clients reading
dirty buckets only must download more than the same clients in the baseline case
because more updates per item have arrived. Higher loads, as explained earlier,
increase broadcast size. A larger broadcast size leads to more updates per item
during those broadcasts. Thus, few buckets are "clean", and little energy savings
can be achieved. Here, the curves for the larger update rate and the baseline
almost overlap because clients must read virtually all the buckets of their DCIs
during longer broadcast periods.
63
A smaller update rate produces a curve similar in shape and slope to the baseline
curve. At low loads, the curve is slightly below the baseline curve. Since fewer
updates have been made to data items, clients are required to read fewer buckets
to keep their caches consistent. At higher loads, however, most of the buckets in
the broadcast are "dirty", and clients are forced to read virtually all of the buckets
of their DCIs to keep their caches consistent. Thus, at high loads, the curves for
the smaller update rate and the baseline are virtually overlapping.
A Comparative Study of CBS and VBS. First, we note that varying the update
fraction does not aect the server side protocols, i.e., the broadcast structure for
all update fractions is the same. The only dierence is the number of buckets a
client must read, if she is able to read from consecutive broadcasts. Thus, the
relationship between CBS and VBS for both hot and cold NEE is the same as that
of the baseline case. These curves are shown in gures 21A and B.
120
110
CBS_hot, MeanChangeFrac = 0.01
VBS_hot, MeanchangeFrac = 0.01
VBS_hot, MeanChangeFrac = 0.02
VBS_hot, MeanChangeFrac = 0.025
100
90
80
70
60
0.05
0.1
0.15
0.2
0.25
0.2
0.25
[A]
250
200
VBS cold, MeanChangeFrac = 0.01
VBS cold, MeanchangeFrac = 0.02
VBS cold, MeanChangeFrac = 0.025
150
100
0.05
0.1
0.15
[B]
Fig. 21. Experimental Results: [A] NEE curves for CBS and VBS hot clients for varying MeanChangeFrac values; [B] NEE curves for CBS and VBS cold clients for varying MeanChangeFrac
values
64
Summary. In summary, we note that the update fraction aects NEE through
the number of buckets a client must read, rather than the amount of time a client
must wait for his DCI. Higher update fractions cause clients to read more buckets,
leading to higher NEE values, while lower update fractions lead to lower NEE
values.
10.7 Eectiveness of The Ignore Factor Construct
We designed our protocols to provide the best possible service for clients interested
in hot items while ensuring that all clients are able to download their DCIs at least
once during their stays in the cell. This is achieved, by using the Ignore Factor
(IF) construct. In order to examine the eectiveness of the IF feature, we tracked
the percentage of clients starved throughout their entire length of stay in the cell.
Note that in VBS the question of starvation does not arise, as all requested items
are included in the broadcast. Consequently, we only explore the issue of starvation
with regard to the CBS scenario.
Another curious feature that emerged when we started exploring this issue, was
that no signicant starvation occurred with the client loads that we have been talking about so far. In order to see signicant client starvation, we had to increase the
load on the system far beyond that considered in our other experiments. Thus, the
arrival rates shown here represent not only those used in our baseline experiments,
but also several signicantly higher loads.
10.7.1 Percentage of clients starved for hot items. Since our protocols were designed to provide preferential service for clients interested in the more popular
items, we were not surprised to nd no signicant starvation of clients interested
in hot items.
10.7.2 Percentage of clients starved for cold items. We report the results of our
starvation study for CBS in gure 22. For brevity, we make two important obser14
Percentage of clients starved
12
CBS_cold, bcastsize = 1060
10
8
6
4
2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Fig. 22. Experimental Results: Percentage of cold clients starved for CBS
vations:
65
|For CBS clients interested in cold items, we found that smaller broadcast sizes
lead to increased contention for broadcast space, and therefore, to higher levels
of client starvation, while larger broadcast sizes have a lower level of contention
for broadcast and less client starvation.
|While some cold clients are indeed starved, the overall percentage of clients
starved is very low; in the worst case, (i.e., for the highest load and smallest
broadcast size we tested) only 12.2% of clients interested in cold items were
unable to download their DCIs. Recalling that only 20% of clients overall are
interested in cold items, we see that, at most, only 2.4% of clients overall are
starved, even at the extreme loads we considered in these experiments.
11. CONCLUSION
In this paper, we looked at the problem of data retrieval by mobile units from
wireless broadcasts. While the problem of data access in broadcasts has been
investigated previously, deciding the specic content of the broadcast has been
unexplored. Specically, in the presence of mobile clients with diering demands,
the determination of the broadcast content is an important problem. The problem
addressed in this paper may be captured by the following two questions:
|Given that users are highly mobile in their mobility domain, what are good strategies that the server can use to decide on what broadcast content?
|Given that good broadcast strategies are found, what are good retrieval algorithms
by which users can retrieve/download data from broadcast, with a minimum of
energy expenditure?
In this work, we presented adaptive protocols for broadcast content determination
and information retrieval. Periodicity has been regarded as an important parameter
in broadcast-and-retrieve environments. We thus considered both periodic (constant
broadcast size (CBS) strategy) and aperiodic (variable broadcast (VBS) size) broadcast strategies.
In the constant broadcast size strategy, we xed the length of a broadcast and
added or dropped items from the broadcast based on their popularity. We also
introduced a metric called Ignore Factor to weigh items on how long requests for
them have been ignored so that even if they are not popular enough to be included
in the broadcast, the clients requesting them are not starved.
In the variable broadcast size strategy, the broadcast size changes according to
the number of items requested during the previous broadcast period. Clients are
guaranteed to receive items that they requested. Since this strategy can potentially
include all the items in the database, we introduced the concept Residence Latency
which will drop items from the broadcast based on the expected time that a client
requesting an item will stay in the cell.
We have performed an approximate yet thorough performance analysis by analytical methods. Our analysis of these strategies suggests that the aperiodic (VBS)
strategy outperforms the periodic (CBS) strategy at low system loads. For moderate loads, VBS provides lower tuning times, while CBS provides lower access times.
At high loads, CBS outperforms VBS overall, providing lower tuning and access
times.
66
Also, our analytical study indicates that the number of index segments in the
broadcast is the most important parameter for the VBS. This parameter has the
greatest eect on the access time since broadcast size in this strategy is not limited.
While informative, our analytical study is subject to certain limitations, e.g., we
omitted consideration of Ignore Factor in order to make our analysis mathematically
tractable. To overcome these limitations, we developed a detailed cost model for
our protocols, and evaluated the performance of our protocols through simulation.
In our baseline simulation experiments, we found that our CBS protocols produced
signicant energy savings over earlier protocols. We also found that, in general, the
aperiodic VBS protocol outperformed the periodic CBS protocol at low loads. At
higher loads, however, the CBS protocol predominated. These results conrm the
basic results of our analytical study.
In terms of eciency, we discovered that, for broadcasts in which clients are interested in only a small portion of the data, trends in Access Time are virtually the
same as trends in energy consumption. We noted that the standard metric for energy eciency, Tuning Time, misrepresents energy consumption by ignoring energy
use in the sleep state. For this reason, we built a detailed cost model and measured
the actual energy expenditure through our Normalized Energy Expenditure metric.
In addition to our baseline experiments, we also experimented with varying some
key parameters. We noted that increasing MeanResidenceLatency or DatabaseSize
caused increased contention for inclusion of items in the broadcast, and therefore,
higher NEE values. Decreasing these parameters produced lower NEE values. Conversely, increasing the BroadcastSize decreased contention for space in the broadcast, which produced NEE values which grew very slowly in proportion to load on
the system. A smaller BroadcastSize produced NEE values which grew much faster
in proportion to load. Varying the MeanChangeFrac parameter showed that, at low
loads, a larger number of updates caused larger NEE values, while a smaller number
of updates caused lower NEE values. This eect disappeared as load increased.
Finally, we considered the percentage of clients starved across varying load levels
in order to show the quality of service provided by our protocols. At normal loads,
we found no signicant client starvation. At much higher loads, we began to see
minor starvation of clients interested in cold items, but no starvation of clients
interested in hot items.
REFERENCES
Acharya, S., Alonso, R., Franklin, M., and Zdonik, S. 1995.
Broadcast disks: Data
management for asymmetric communication environments. In Proc. ACM SIGMOD (San
Jose, California, May 1995), pp. 199{210.
Acharya, S., Franklin, M., and Zdonik, S. 1996a. Disseminating updates on broadcast
disks. In Proc. 22nd VLDB Conference (Mumbai, India, September 1996), pp. 354{365.
Acharya, S., Franklin, M., and Zdonik, S. 1996b. Prefetching from a broadcast disk. In
Proc. Intl. Conf. on Data Engg. (New Orleans, Louisiana, February 1996), pp. 276{285.
Acharya, S., Franklin, M., and Zdonik, S. 1997. Balancing push and pull for data
broadcast. In Proc. ACM SIGMOD (Tucson, Arizona, May 1997), pp. 183{194.
Alonso, R. and Korth, H. 1993. Database issues in nomadic computing. In Proceedings
of the 1993 ACM-SIGMOD (Washington, DC, May 1993), pp. 388{392.
Badrinath, B. 1997. Personal communication. Computer Science Dept., Rutgers University.
Badrinath, B. R. and Imielinski, T. 1992. Replication and mobility. In Second Workshop
on the Management of Replicated Data (November 1992), pp. 9{12.
67
Barbara, D. and Imielinski, T. 1994. Sleepers and workaholics: Caching strategies in
mobile environments. In R. Snodgrass and M. Winslett Eds., Proceedings of the 1994
ACM SIGMOD (May 1994), pp. 1{12.
Technical guide for lithium ion cell model icr-18650.
http://www.polystor.com/.product/tguide.htm.
D. Lam, D. C., J. Jannink and Wisdom, J. 1995. Modeling location management in
personal communication services. Technical report (October), Stanford University.
Datta, A. 1994. Research Issues in Databases for Active Rapidly Changing data Systems
(ARCS). ACM SIGMOD RECORD 23, 3 (September), 8{13.
Dunham, M. H. and Helal, A. 1995. Mobile computing and databases: Anything new?
In SIGMOD Record , Volume 24 (December 1995), pp. 5{9. Association for Computing
Machinery.
Fishwick, P. 1995. Simulation Model Design And Execution: Building Digital Worlds.
Prentice Hall.
Forman, H. G. and Zahorjan, J. 1994. The challenges of mobile computing. IEEE Computers 27, 4 (April), 38{47.
Franklin, M. 1997. Personal communication. Computer Science Dept., University of Maryland.
Huang, Y., Sistla, P., and Wolfson, O. 1994. Data replication for mobile computers.
SIGMOD Record 23, 2, 13{24.
Imielinski, T. and Badrinath, B. R. 1992. Querying in highly mobile distributed environments. In Proceedings of the 18th International Conference on Very Large Databases
(August 1992), pp. 41{52.
Imielinski, T. and Badrinath, B. R. 1994. Mobile wireless computing. Communications
of the ACM 37, 10 (October), 19{28.
Imielinski, T., Vishwanath, S., and Badrinath, B. R. 1997. Data on air: Organization
and access. IEEE Transactions on Knowledge and Data Engineering 9, 3 (May/June),
353{372.
Imilienski, T., Vishwanathan, S., and Badrinath, B. 1994a. Energy ecient indexing
on air. In Proc. ACM SIGMOD (May 1994), pp. 25{36.
Imilienski, T., Vishwanathan, S., and Badrinath, B. 1994b. Power ecient ltering of
data on air. In Proc. EDBT (March 1994), pp. 245{258.
Inc.,
M. 1997.
Personal
messenger
100c
modem
specications.
http://www.mot.com/MIMS/WDG/techLibraryDir/specsDir/pm100cSpecs.html.
Li, K., Kumpf, R., Horton, P., and Anderson, T. 1993. A quantitative analysis of disk
drive power management in portable computers. Technical Report CSD-93-779, Computer
Science Division, University of California, Berkeley.
Malyan, A. D., Ng, L. J., Leung, V. C., and Donaldson, R. W. 1993. Network architecture and signaling for wireless personal communications. IEEE Journal on Selected Areas
in Communications 11, 6 (August), 830{841.
NYSE, I. 1997. Nyse listed companies. http://www.nyse.com/public/about/lstcotoc.html.
P. Krishna, N. H. V. and Pradhan, D. K. 1994. Location management in distributed mobile environments. In Proceedings of the Conference on Parallel and Distrbuted Information
Systems (September 1994), pp. 81{88.
Satyanarayanan, M. 1996. Accessing information on demand at any location. mobile information access. IEEE Personal Communications 3, 1, 26{33.
Seagate
Technology,
I. 1997.
Marathon family of disc drives.
http://www.seagate.com/disc/marathon/marasl.shtml.
Shivakumar, N. and Wisdom, J. 1994. User prole replication for faster location lookup
in mobile environments. Technical report, Stanford University.
V. Anantharam, U. M., M. L. Hong and Wei, V. K. 1994. Optimization of a database
hierarchy for mobility tracking in a personal communications network. Performance Evaluation 20, 2 (May), 287{300.
Corporation, P. 1997.
68
Vaidya, N. H. and Hameed, S. 1996a.
Data broadcast in asymmetric environments. In
First International Workshop on Satellite-based Information Services (WOSBIS) (November 1996).
Vaidya, N. H. and Hameed, S. 1996b. Scheduling data broadcast in asymmetric communication environments. Technical Report 96-022 (Nov.), Computer Science Department,
Texas A&M University, College Station.
Wolfson, O. and Jajodia, S. 1992. Distributed algorithms for dynamic replication of data.
In Proceedings of the Symposium on Principles of Database Systems (1992), pp. 149{163.
Zdonik, S., Franklin, M., Alonso, R., and Acharya, S. 1994. Are "disks in the air" just
pie in the sky? In Proc. IEEE Workshop on Mobile Computing Systems and Applications
(Santa Cruz, California, December 1994).
Appendix
Proof of Lemma 1
Let TB = m(I + B ) be the size of the broadcast in the Constant Broadcast Size
strategy. For an arbitrary j ,
P fTw = 12 TB + jTB g = ph (1 , ph;in )j ph;in + pc (1 , pc;in )j pc;in
Therefore,
1
X
E [Tw ] = ( 21 + j )m(I + B )(ph(1 , ph;in )j ph;in + pc (1 , pc;in )j pc;in )
j =0
Proof of Lemma 2
Pin = ph;in ph +pc;in pc ph;in =
b
b
X
+ + b )! Y
(1 + + b )! Y
pi i pc;in + (1 Q
pi i
Qb
b !
!
i
i
i=1
i=1
i=1
i=1
fB c g
fBh g
(54)
X
Proof of Lemma 3
Lemma 3 is based on the scenario where the DCI is in the current broadcast and the
client doesn't miss it at all. Scenario 1 is subdivided into sub-scenarios: Scenario
TI ), and Scenario 1b
1a where client initially tunes in to an index segment (M;
TD).
where the client initially tunes in to a data segment (M;
TI g = Pin m , 1
P fScenario 1ag = Pin P fM;
2m + S
m
,
1 S
TDg = Pin
P fScenario 1bg = P fM;
2m 2m + S
Since Scenario 1a and Scenario 1b are mutually exclusive events,
P fScenario 1g = P fScenario 1ag + P fScenario 1bg
m,1 S
,1
= Pin 2m
m + S + Pin 2m 2m + S
,1 m,1 S
= Pin ( 2m
m + S + 2m 2m + S )
TT1 = 1 + l + S
The TT and AT are derived as follows,
TT1 = (Client reads the current bucket, gets the address of the next nearest
index segment, tunes out) + (Client reads the index tree, tuning
in and out along the B-tree, tunes out) + (Client downloads the DCI)
= 1+l+S
Under Scenario 1a
AT1a = (Access time if a client tunes in to an index segment)
= (Client tunes in to middle of index segment, waits until next index, reads
index, waits until the DCI, downloads it)
Under Scenario 1b
= I2 + B + I + L1 + S
AT1b = (Access time if a client tunes in to a data block)
(Client tunes in to a data block, waits until the next index, reads
index, waits until the DCI, downloads it)
= B2 + I + L1 + S
Therefore,
E [AT1 jScenario1] = P fScenario 1agAT1a + P fScenario 1bgAT1b
m,1 S B
,1 I
= Pin 2m
m + S ( 2 + B + I + L1 + S ) + Pin 2m 2m + S ( 2 + I + L1 + S )
B
S
,1 I
= Pin 2m
m + S ( 2 + B + I + L1 + S + ( 2 + I + L1 + S ) 2m )
All subsequent lemmas have a similar format to Lemma 3, i.e., they will be concerned with deriving expressions for probabilities of scenarios as well as AT and
TT expressions for these scenarios. In the previous lemma we derived expressions
for AT and TT . In the subsequent lemmas we omit these derivations.
Proof of Lemma 4
Scenario 2 is subdivided into two sub-scenarios: Scenario 2a where client initially
tunes in to an index segment (M; TI ),and Scenario 2b where client initially tunes
in to a data segment (M; TD).
P fScenario 2ag = Pin P fM; TI gPin = Pin2 2mm ,+ 1S
,1 + B ,S) S
P fScenario 2bg = Pin P fM; TDgPin = Pin2 ( m2m
mB 2m + S
Since Scenario 2a and Scenario 2b are mutually exclusive events,
P fScenario 2g = P fScenario 2ag + P fScenario 2bg
S
,1
2 m,1 B,S
= Pin2 2m
m + S + Pin ( 2m + mB ) 2m + S
,1 m,1 B ,S S
= Pin2 ( 2m
m + S + ( 2m + mB ) 2m + S )
Further,
AT2a = 32I + B + L2 + m(I 2+ B ) + S
AT = B + I + L2 + m(I + B ) + S
2b
2
2
E [AT2 jScenario2] = P fScenario 2agAT2a + P fScenario 2bgAT2b
m(I + B ) + S ) +
, 1 3I
= Pin2 2m
m + S ( 2 + B + L2 + 2
, 1 + B , S ) S ( B + I + L2 + m(I + B ) + S )
Pin2 ( m2m
mB 2m + S 2
2
3
I
m
(
I
+
B
)
1
+ S) +
= Pin2 2m + S ((m , 1)( 2 + B + L2 +
2
1 B,S B
m(I + B )
( m2,
m + mB )S ( 2 + I + L2 + 2 + S ))
Proof of Lemma 5
1 B,S S
P fScenario 3g = Pin P fM g(1 , Pin ) = Pin (1 , Pin )( 2mm ,+ 1S + ( m2,
m + mB ) 2m + S )
Proof of Lemma 6
S S
P fScenario 4g = Pin P fPM gPin = Pin2 mB
2m + S
Proof of Lemma 7
S S (1 , Pin )
P fScenario 5g = Pin P fPM g(1 , Pin ) = Pin mB
2m + S
Proof of Lemma 9
Expected distance from end of index to the DCI, given that the DCI has not been
missed is,
L1 = E fDistance to DCIjM g =
=
=
m X
m
X
m X
m
X
i=2 j =i
E fDistance to DCIjTIi ; Dj gP fTIi gP fDj g
( B2 + (j , i)(I + B )) m12
i=2 j =i
m
(m , i + 1)B
1 X
m2 i=2 (
2
, i)(I + B ) )
+ (m , i + 1)(m
2
1)(m , 2)
= B (m4m, 1) + (I + B )(m6,
m
Proof of Lemma 10
The expected distance from end of index to the next broadcast, given that the DCI
has been missed is,
L2 = E fDistance to next broadcastjM g
=
=
=
m
,1 X
m
X
E fDistance to next broadcastjTIj ; Di gP fTIj gP fDi g
i=1 j =i+1
,1 X
m
1 mX
m2 i=1 j=i+1(B + (I + B )(m , j ))
,1
(m , i)(m , i , 1) )
1 mX
m2 i=1 ((m , i)B + (I + B )
2
2m2 , 6m + 3
1
= 2m ((m , 1)B + (I + B )
6
)
Proof of Lemma 11
P fDCI 2 BRg = ph;in ph + pc;in pc = (1 , (1 , (1 ,bhp0 )ph )N )ph + (1 , (1 , (1b(1, ,p0h)p)c )N )pc
Proof of Lemma 13
TI = P fM j TI gP fTI g = m , 1
P fScenario 7g = P fM;
2m + S
Proof of Lemma 14
TDg = P fM j TDg = m , 1 S P fTDg
P fScenario 8g = P fM;
2m 2m + S
Proof of Lemma 15
P fScenario 9g = P fM; TI g = P fM j TI gP fTI g = 2mm ++ 1S
Proof of Lemma 16
B , 2S S
P fScenario 10g = P fM; TDg = P fM j TDgP fTDg = mB +
2mB
2m + S
Proof of Lemma 17
S S
P fScenario 11g = P fPM; TDg = P fPM j TDgP fTDg = mB
2m + S
Proof of Lemma 18
P fScenario 12g = 1 , P fDCI 2 BRg and substitute the result of Lemma 11

Broadcast Protocols to Support E cient Retrieval from Databases by

Transkript

Benzer belgeler

Don`t Forget the Differential Equations: Finishing 2005

characterizations of some associated and special curves

İndeks Corporate Brochure

Untitled - Komatsu

lauter layout - Microsoft Research