Broadcast Protocols to Support E cient Retrieval from Databases by
Transkript
Broadcast Protocols to Support E cient Retrieval from Databases by
Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users Anindya Datta The University of Arizona and Debra E. VanderMeer Tandem Computers and Aslihan Celik The University of Arizona and Vijay Kumar University of Missouri-Kansas City Mobile computing has the potential for managing information globally. Data management issues in mobile computing have received some attention in recent times, and the design of adaptive broadcast protocols has been posed as an important problem. Such protocols are employed by database servers to decide on the content of broadcasts dynamically, in response to client mobility and demand patterns. In this paper we design such protocols and also propose ecient retrieval strategies that may be employed by clients to download information from broadcasts. The goal is to design cooperative strategies between server and client to provide access to information in such a way as to minimize energy expenditure by clients. We evaluate the performance of our protocols both analytically and through simulation. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems; H.2.8 [Database Management]: Applications; C.2.1 [Computer-Communication Networks]: Network ArName: Anindya Datta Address: MIS Dept., University of Arizona, Tucson, AZ 85721 Name: Debra E. VanderMeer Address: Tandem Computers, 10501 N. Tantau Avenue, Cupertino, CA 95014 Name: Aslihan Celik Address: MIS Dept., University of Arizona, Tucson, AZ 85721 Name: Vijay Kumar Address: Dept. of Computer Science and Telecommunications, University of Missouri-Kansas City, Kansas City, MO 64110 An early version of this paper previously appeared in Proc. Intl. Conf. on Data Engineering, Birmingham, U.K., April 1997. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or direct commercial advantage and that copies show this notice on the rst page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specic permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212) 869-0481, or [email protected]. 2 A. Datta, D. VanderMeer, A. Celik, V. Kumar chitecture and Design|Wireless communication; H.4.3 [Information Systems Applications]: Communications Applications General Terms: Algorithms, Performance Additional Key Words and Phrases: Adaptive Broadcast Protocols, Mobile Databases, Energy Conservation, Client-Server Computing 1. INTRODUCTION One of the most important information dissemination strategies in the current fast paced corporate and personal environments is to employ wireless networks. Furthermore, the data-hungry users who access such networks are expected to be mobile. In fact, a widely accepted information access model envisions millions of highly mobile users carrying small, battery powered palmtop terminals equipped with wireless connectivity. The types of information that may potentially be accessed are virtually boundless, e.g., nancial, weather, trac, sports and airline schedules. While the challenges of designing and implementing wireless networks are important ones for the telecommunications research community, data management issues related to organizing and retrieving information from wireless channels have posed challenges for the database community as well. Recent advances in wireless technology have made data access possible for mobile users, who are geographically dispersed. However, there are a number of hardware and software problems which must be resolved before the capabilities of mobile computing can be fully utilized. Some of the software problems, such as data management, transaction management, database recovery, etc., have their origins in distributed database systems. In mobile computing, however, these problems become more dicult to solve, mainly because of the narrow bandwidth of the wireless communication channels, the relatively short active life of the power supplies (battery) of mobile units, and the changing locations of required information due to client mobility. A widely accepted paradigm of disseminating information in a disconnected and mobile environment, is through the use of broadcasts [Vaidya and Hameed 1996a; Barbara and Imielinski 1994; Imilienski et al. 1994a; Imilienski et al. 1994b]. In this model a \server" broadcasts information and \clients" tune in to the broadcast to retrieve their data of interest. The widespread acceptance of the broadcast phenomenon is mainly due to the notion of broadcast asymmetry, where the \downstream" (server to client) communication capacity is relatively much greater than the \upstream" (client to server) communication capacity. The two major issues of interest in this scenario are (a) how and what does the server broadcast, and (b) how does the client retrieve? Of particular interest are solutions that enable clients to retrieve their data of interest eciently, as well as with a minimum of energy expenditure. Of course, the two above questions are composed of several dierent sub-issues, as explained soon in this paper. In this paper we deal with the problem of data retrieval by mobile units from wireless broadcasts. The goal is to design and evaluate adaptive broadcast protocols to support ecient information retrieval while simultaneously attempting Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 3 to minimize energy consumption by the mobile units. The paper is organized as follows. We provide a brief review of existing work in the mobile computing area in section 2 and a description of our mobile computing architecture in section 3. We then go on to stipulate a concrete problem statement in section 4, discuss some preliminary information in section 5 and describe our protocols in section 6. Subsequently, we evaluate the performance of our protocols analytically in section 7 and analyze and discuss the performance in section 8. In order to overcome some of the limitations of our analytical model, we further performed an empirical performance study, through simulation. We describe the simulation model in section 9, and discuss the results of our experiments in section 10. We conclude in section 11. 1.1 Contribution of this Paper In recent times there has been some interesting work in investigating data issues in mobile environments (see section 2 for extended discussion). In particular, very interesting research in the area of broadcasts in mobile environments is reported by Imilienski and Badrinath et al. in Rutgers (see e.g., [Imielinski et al. 1997; Barbara and Imielinski 1994; Imielinski and Badrinath 1992; Imilienski et al. 1994a; Imilienski et al. 1994b]) and Vaidya et al. in Texas A & M (see e.g., [Vaidya and Hameed 1996a; Vaidya and Hameed 1996b]). Most of this work is concerned with access methods, i.e., how to organize the broadcast data. In particular Imilienski, Badrinath et al. have done extensive treatment of indexing in broadcasts, and how to organize data and index in broadcasts. Vaidya et al. report nice results on broadcast scheduling, i.e., when to broadcast particular items, such that response times (termed \access times" in these papers) are minimized. In reporting this work, researchers have suggested various methods of indexing information, or have designed algorithms to maintain cache consistency at clients. This work pre-supposes that the composition of the broadcast is known. In other words, given the composition of a broadcast, there has been lot of nice work in organizing data and index on air, as well as scheduling broadcasts. Our concern is dierent. In [Imielinski et al. 1997], while discussing relevant future work, the authors mention that the most likely mode of retrieval in mobile environments is going to be mixed, i.e., the most frequently accessed items are broadcast and other items are provided on demand. To support such a mixed technique, the authors remark, adaptive broadcast protocols need to be designed. Such protocols will dynamically alter the broadcast content depending upon client demand patterns. In this paper we propose and evaluate, both analytically and empirically, the performance of two such protocols. To the best of our knowledge, there does not exist, in the published literature, other examinations of the same topic. Aside from the dynamic determination of broadcast content, which is concerned with the server side, this paper also delves deeply into the client side. Specically, we address the problem of ecient and power conservant retrieval by clients. To this end, this paper proposes a number of retrieval protocols that a client might use to download data from broadcasts. Thus, this paper oers an integrated treatment of broadcast content determination and retrieval. This is achieved by designing cooperative protocols between the server and client sides. This is also a novel aspect of this work. We would like to note however, that we draw heavily upon existing work. Actually, this work, together with work cited above, forms a comprehensive 4 A. Datta, D. VanderMeer, A. Celik, V. Kumar theory of organizing, managing and retrieving from broadcasts in a wireless, mobile environment. 2. RELATED WORK The primary reference area of this paper is data management in wireless networks in general and in mobile computing in particular. There exists much work in the area of mobile computing, particularly from a telecommunications perspective. This work is somewhat orthogonal for our purposes. Therefore, rather than cite these papers, we prefer to refer the reader to the following excellent web site: http://maui.ee.tu-berlin.de/bibl/. This site, at last count, contained over 200 papers on mobile networking and is an excellent source of background material. Our primary interest however, is the area of data management in the context of mobile computing. This area has received considerable research attention in the recent past. Nice discussions of several data management issues in mobile computing scenarios may be found in [Alonso and Korth 1993; Dunham and Helal 1995; Imielinski and Badrinath 1994; Forman and Zahorjan 1994]. While the aforementioned papers are somewhat high level, there are also papers that deal in detail with specic issues. Architecture of database systems in mobile environments receives detailed treatment in [V. Anantharam and Wei 1994; Malyan et al. 1993]. In [V. Anantharam and Wei 1994], the authors propose a dynamic programming strategy to optimize the mapping of a database hierarchy to the network conguration. In [Malyan et al. 1993], a complete architecture for wireless communication is proposed, which includes a hierarchical database structure to implement the HLR/VLR users location scheme. Location Management is yet another important topic in this area, which is primarily concerned with designing strategies to keep track of mobile users. In [P. Krishna and Pradhan 1994] a number of search strategies are proposed to locate users. Another nice paper in location management is [D. Lam and Wisdom 1995], which proposes a framework for modeling and evaluating the performance of location management schemes for a Personal Communication Services (PCS) network. In [Shivakumar and Wisdom 1994], user prole replication is proposed as a mechanism for rapid location lookup. A recent paper [Satyanarayanan 1996] considers the \span of access" under changing location conditions and oers a summary of research in this area. Data Replication is considered in [Badrinath and Imielinski 1992], which studies cache consistency for replicated data. In [Wolfson and Jajodia 1992], dynamic data replication algorithms are proposed for tree and star networks. Another interesting paper in this area is [Huang et al. 1994] which considers the problem of data allocation in a large number of databases accessed through a wireless network, and proposes strategies to optimize communication costs. Data retrieval from broadcasts is the topic of this paper. A number of papers have appeared on this subject, including [Barbara and Imielinski 1994; Imilienski et al. 1994a; Imilienski et al. 1994b; Imielinski et al. 1997; Vaidya and Hameed 1996a; Vaidya and Hameed 1996b]. This set of papers forms the background of our work reported in this document. The papers from Rutgers [Barbara and Imielinski 1994; Imilienski et al. 1994a; Imilienski et al. 1994b; Imielinski et al. 1997] primarily deal Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 5 with data organization in a broadcast, e.g., index organization and structure and cache invalidation strategies. We, on the other hand, are concerned with how to determine optimal broadcast content and design \good' retrieval strategies for users. Note that these two (i.e., their and our work) are complementary. Our strategies and their strategies put together form a comprehensive broadcasting framework. The papers by Vaidya et al. deal with, among other issues, the notion of broadcast scheduling, i.e., when to broadcast particular items. The metric optimized is access time, i.e., response times. Aside from a basic dierence in the broad objectives (we are concerned with broadcast content determination and client retrieval strategies), another point of departure of our work is that we are concerned not only with access time, but also with \tuning time" (the duration of time clients actively listen). Thus, both eciency as well as power conservation, are of interest to us. Finally, a very relevant work from our perspective is the work on Broadcast Disks (BD) performed by researchers at Brown, Maryland and Matsushita Labs. A BD involves the determination of a broadcast program containing all the data items to be disseminated and the subsequent transmission of this program in a periodic manner. The repetitive nature of the broadcast allows the medium to be used as a type of storage device, hence the name \broadcast disk". The novelty of this work is that it proposes a \multi-level" structuring mechanism that allocates bandwidth to data items based on the access importance of these items. BDs are motivated by the notion of communication asymmetry introduced in [Zdonik et al. 1994]. In [Acharya et al. 1995] the authors study cache management policies under the BD paradigm, and extend this work to include prefetching in [Acharya et al. 1996b]. The impact of updates on the base data is studied in [Acharya et al. 1996a] and the impact of a backchannel by which clients can \pull" data items is studied in [Acharya et al. 1997]. While BDs are related to the work presented in this paper, there exist some fundamental dierences between them. First, BDs have been proposed for (primarily push based) "information dissemination" frameworks in general. While the mobile broadcast scenario considered here qualies as an information dissemination framework, the issue of mobility introduces features that the BD framework does not address[Badrinath 1997; Franklin 1997]. |Firstly, BDs are constructed based on known access probabilities { this is very hard, if not impossible, to know in a mobile framework where clients move in and out of cells at their own will. Our strategies consider stateless servers, i.e., servers have no a priori knowledge of client movement or access patterns. |Secondly, for the abovementioned reason (i.e, clients moving in and out of cells), the broadcast content needs to be dynamic { BDs provide a static broadcast content. Even in [Acharya et al. 1997], where the backchannel is studied, the authors make it a point to say that the "broadcast is static" and dynamically changing broadcast content is identied as an area of future study. We have created a framework where the broadcast adapts to changing client access patterns. |Thirdly, in the BD framework client demands are completely satised, i.e, all the demanded items are included in the broadcast. In a mobile scenario however, the broadcast period may be too small for the inclusion of all items (particularly if some items are broadcast more often that others as in the BD framework) { this means that some method would need to be constructed that includes certain 6 A. Datta, D. VanderMeer, A. Celik, V. Kumar items in the broadcast in preference to others. The BD framework is unable to handle this. We have created a sophisticated prioritization mechanism to handle this issue. |Finally, the BD framework exclusively considers latency as the primary performance metric { no attention is paid to energy conservation by the clients. We, on the other hand, simultaneously pay attention to both. In summary, while the BD work is interesting and relevant, there are signicant limitations of this framework in order to apply it to mobile computing frameworks. However, a very interesting line of study would be to adapt the BD framework to make it suitable for use by mobile clients and servers, by addressing some of issues identied above. 3. ARCHITECTURE ISSUES In this section we give a brief overview of the architectural assumptions made in this paper, both with regard to the underlying networked mobile environment as well as to the database. 3.1 Mobile Environment We have used a general architecture of a mobile platform, which is given in Figure 1 [Dunham and Helal 1995]. It is a distributed architecture where a number of computers, xed hosts and base stations, are interconnected through a high speed wired network. Fixed hosts are general purpose computers which are not equipped to manage mobile units, but can be congured to do so. Base stations are equipped with wireless interfaces and communicate with mobile units to support data access. Wireless Radio Cells Wireless Radio Cell crossing disconnected Base Station Fixed Host Fixed Host Fixed Host Fixed Host High Speed Wired Network Base Station Fixed Host Base Station Base Station Fixed Host Mobile Units Wireless Radio Cell Wireless LAN Cell Fig. 1. A General Architecture of a Mobile Platform Wireless Links Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 7 Mobile units are battery-powered portable computers, which move around freely in a restricted area, referred to here as a geographic mobility domain. The size restriction on a unit's mobility is mainly due to the limited bandwidth of wireless communication channels. To manage the mobility of units the entire geographic mobility domain is divided into smaller domains called cells. The mobile discipline requires that the movement of mobile units be unrestricted within the geographic mobility domain (inter-cell movement). unit is in the middle of retrieving records of a le and if it has to The mobile computing platform can be eectively described under the client/server paradigm. Thus sometimes we refer to a mobile unit as a client and sometimes as a user. The base stations are identied as servers. Each cell is managed by a base station, which contains transmitters and receivers for responding to the information processing needs of clients located in the cell. We assume that the size of a cell is such that the average query response time is much smaller than the time required by the client to traverse it. Therefore, it will seldom occur that a user submits a query and exits the cell before receiving the response. Clients and servers communicate through wireless channels. The communication link between a client and a server may be modeled as multiple data channels, or a single channel [Imielinski et al. 1997]. We assume a single channel for reasons outlined in [Imielinski et al. 1997]. We further assume that this channel consists of both an uplink for moving data from client to server and a downlink for moving data from server to client. 3.2 Database Architecture and its Characteristics Data replication and partitioning are broad areas of research themselves; they are not the concern of this paper. Here, we have assumed full database replication. The data is characterized as rapidly changing [Datta 1994]; users often query servers to remain up-to-date. More specically, they will often want to track every broadcast for their data item of interest. Typical examples of this type of data are stock, weather, and airline information. We assume the following for fully characterizing our mobile database. 1. The database is updated asynchronously, i.e., by an independent external process, which does not aect the protocols designed in this paper. Also, such updates arrive with high frequency, signifying that the database is rapidly changing. Examples of such information are stock, weather, etc. 2. Users are highly mobile and randomly enter and exit from cells. There is a parameter called Residence Latency (RL) which characterizes the average duration of a user's stay in the cell. We provide a full explanation of RL in section 5. 3. User reference behavior is localized; e.g., some stocks are more popular than others. 4. Servers are stateless, i.e., they maintain neither client arrival and departure patterns nor client-specic data request information [Imielinski et al. 1997]. We assume a stateless server because we believe that the cost of maintaining a stateful server in a mobile environment would be prohibitively expensive. We want to emphasize, however, that our scheme will work with stateful servers as well. 8 A. Datta, D. VanderMeer, A. Celik, V. Kumar 4. PROBLEM STATEMENT Wireless networks dier from wired networks in many ways. Database users over a wired network remain connected not only to the network, but also to a continuous power source. Thus, response time is the key performance metric. In a wireless network, however, both the response time and the active life of the user's power source (battery) are important. While a mobile unit is listening or transmitting on the line, it is considered to be in active mode. Assuming a power source of 10 AA batteries and a laptop equipped with CD-ROM and display, estimated battery life in active mode is approximately 2.7 hours [Imielinski et al. 1997]. In order to conserve energy and extend battery life, a clients slips into doze (stand by) mode, in which she is not actively listening on the channel. Clients expend signicantly less energy in doze mode than in active mode. Therefore, one of the major goals of our scheme is to minimize the amount of time a client must spend in active mode to retrieve the data items she requests. The problem addressed in this paper may be captured by the following question: given that users are highly mobile in their mobility domain, what are good strategies that the server can use to decide on what to broadcast? The assumption is that such strategies need to adapt to user demand patterns in the highly mobile environment. We are also interested in the question of retrieval strategies: given that good broadcast strategies are found, what are good retrieval algorithms by which users can retrieve/download data from broadcast, with a minimum of energy expenditure? The basic idea, inspired by suggestions in [Imielinski et al. 1997], is one of \mixed broadcasting", i.e., automatic, as well as on-demand broadcasting. We follow a similar approach to that presented in [Imielinski et al. 1997] for characterizing the performance of our algorithms. We dene the following terms: Access Time (AT): Access time refers to the time elapsed between query submission and receipt of the response. Tuning Time (TT): Tuning time is the duration of time that the client spends actively listening on the channel. The meaning of these terms is illustrated in gure 2. Consider a client who submits a request at time T0 and receives the response at time T7 . T0 T1 T2 T3 T4 T5 T6 T7 Time Access Time Tuning Time Fig. 2. Access and Tuning Times In this scenario, if the client listens continuously from the time a query is submitted until the response is received, then AT = TT = (T7 , T0 ). On the other hand, if the client slips into doze mode intermittently, then TT is noticeably less than AT, signicantly reducing battery usage. In this case AT = (T7 , T0), and Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 9 TT = (T7 , T6 ) + (T5 , T4) + (T3 , T2 ) + (T1 , T0 ). This results in energy conservation, as the client is in active mode for only short periods of time. The question, of course, is how to determine the smallest possible tuning intervals. An ideal approach appears to be providing the client with precise knowledge of when her requested information will be broadcast. Our aim is to nd optimal points in the two dimensional space of AT and TT. This becomes dicult because there appears to be a trade-o between AT and TT; attempts to reduce one tend to increase the other. For example, access time is directly related to the size of the broadcast, i.e., AT is smaller for a smaller broadcast size. On the other hand, providing information for selective auto-tuning, i.e., informing the user precisely where its required data is located in the broadcast, reduces tuning time. However, inclusion of such tuning information would increase the overall size of the broadcast by including overhead, which in turn could increase AT. Conversely, eliminating this overhead will reduce AT at the expense of an increased TT, because the user will not know precisely when to tune in. 5. PRELIMINARIES Before delving into details of our proposed strategies, we rst describe certain notions that we will use in developing those strategies. Broadcast Set and Content: Broadcast set refers to the set of data items included in a broadcast. Broadcast content refers to the composition of a broadcast, i.e., the data tuples along with access and overhead information included in a broadcast. Note that while the broadcast set of two successive broadcasts may be identical, the content may change due to changes in the actual data values that may have occurred in the meantime. For example, while IBM stock prices at various markets may be broadcast repeatedly, the actual data tuples corresponding to the IBM prices may change. A major goal of this paper is the devise strategies to determine what the broadcast set should be, given the high degree of mobility as well as particular demand patterns of clients. In particular the server must make two decisions: (a) identify items to include in broadcast, and (b) determine the temporal duration that a particular data item needs to be broadcast, once it is included. Broadcast Periodicity: The periodicity of broadcast is important as it aects both the access and tuning times. We consider both periodic and aperiodic broadcasts in our strategies. Data Organization: Data organization refers to how the broadcast content is organized, e.g., how data and indices are interleaved. In [Imielinski et al. 1997], data organization has been addressed very nicely and is not a research issue in our work. We make use of the results presented in [Imielinski et al. 1997]. In particular, for our strategies, we assume a (1,m) indexing strategy as presented in [Imielinski et al. 1997]. This strategy posits that a complete index for the broadcast is repeated every ( m1 )th of the broadcast. In other words, the entire index occurs m times during a broadcast, where m is a parameter. Residence Latency (RL) and Estimated Departure Time (EDT): When the server decides to include an item in its broadcast, it also needs to decide the 10 A. Datta, D. VanderMeer, A. Celik, V. Kumar length of time this item will remain in its broadcast set. In order to identify the residency duration of a data item in a broadcast, we propose to associate a Residence Latency (RL) value with each cell. RL identies the duration an item would remain in a broadcast for that cell. The RL value for a specic cell is the average length of time a mobile user resides there. RL for a cell could be computed a priori based on the advance knowledge of user movement patterns and cell geography. We believe this is feasible based on information currently available. A data item's Expected Departure Time (EDT) from a broadcast is computed by adding the item's entry time into the broadcast and the cell's RL. Popularity Factor: The popularity factor of item X at time T , denoted by PFTX , identies the number of clients in the cell at time T who are interested in X. Henceforth we shall drop the use of the time superscript in the PF expression without any loss of generality. The PF of item X is maintained in the system as follows. When a client requests X, PFX is increased by 1. However, every time PFX is incremented, the system records the corresponding time. Let the timestamp of the ith increment to PFX be denoted by TXi . Then, a corresponding decrement of 1 is performed on the value of PFX at time (TXi + RL). This reects the (anticipated) departure of the client whose request caused the ith increment. We now describe how PF for dierent data items can be computed eciently. We begin by reminding the reader that a popularity factor (PF) is maintained for each data item. Let the notation PFX denote the popularity factor of data item X. For each new request for a data item arriving at time t, its PF is incremented by 1 and subsequently decremented by 1 at time t + RL. Given that there may be a large number of data items in the database, and large client loads, it may appear that maintaining PF is an expensive task. Of particular concern is the scalability of the PF maintenance process with increases in database size. However, as we show below, PF can be maintained extremely eciently by the use of simple data structures. In particular, we show that the time eciency of this process is independent of the database size. We assume the existence of a global clock, which is the basis of synchronization of all time dependent occurrences in the system. We maintain a queue, termed the PF Decrement Queue (PDQ) where the Decrement Instructions (DI) are enqueued. A DI is generated every time the PF of any item is increased and stores the time at which the popularity factor must be decremented. It may be thought of as a 2-tuple <data item, timestamp>. Thus, a DI <X,ti > indicates, that at time ti , PFX should be decremented by 1. Upon the generation of a DI, it is immediately enqueued in the PDQ. The PF Maintenance Process (PFMP) monitors the head of this queue. At each time advancement epoch (granularity to be decided by the system designer), the PFMP examines the DI at the head of the PDQ and compares its timestamp to the current time. If the timestamp is greater than or equal to the current time, the DI is executed, i.e., the PF of the data item referenced by the DI is adjusted. Note that this adjustment is a basic arithmetic operation, and is thus extremely fast. Moreover, since there is only one PF value per data item these can be easily maintained in some sort of directly addressable data structure in memory. We do not anticipate the size of this data structure to be overly large, e.g., if a broadcast were to disseminate stock information of every publicly traded stock on earth, it would need to keep track of Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 11 around 8000 data items (2900 for stocks traded on the NYSE [Inc. 1997]). Clearly, therefore the complexity of the PFMP process is quite low { of the order of a few instructions per unit time (one comparison and one (possible) addition and one dequeue). This is well within the capabilities of even current desktop machines, let alone powerful server hardware that would very likely be used as cell servers. Note that the complexity of this process is independent of the size of the database. Also, the rapidity of the PFMP process does not allow the PDQ to grow too large, which is an added benet. Dynamism of the Underlying Database: It has been widely suggested (e.g., see [Imielinski et al. 1997]) that certain information that people are likely to access in a mobile fashion would be dynamic, e.g., stock information, weather and trac. This rapidly changing nature makes these databases more dicult to manage. In this paper we are interested in these types of systems. The rapidly changing nature of the database has some eects on the broadcasting strategy. In particular, in this scenario, there is a reasonable likelihood that the database state will change from one broadcast to another. Such changes often occur in bursts, e.g., a large fraction of stock trades occur in the last hour of trading. At these times, clients are particularly interested in retrieving current values for their data items of interest. Moreover, at these times, clients would often download data items from every broadcast to keep themselves apprised of the rapidly changing current state. It is easily seen that this can be quite expensive, as the client needs to tune into every broadcast. We are particularly interested in minimizing a client's cost (i.e., energy expenditure) in such scenarios. Our basic strategy may be explained through the following example. Assume that a client is interested in a particular data item, corresponding to which there exist B buckets in a broadcast. Assume further that the client would like to download the same information in the following broadcast. Now, if only Bdirty buckets have changed, where Bdirty < B , it is then advantageous for the client to simply download the dirty (i.e., modied since the previous broadcast) buckets from the subsequent broadcast, rather than the full set of B buckets. One can easily see that such savings quickly accumulate over a few successive broadcasts. In order to execute this strategy, we shall maintain certain special metadata in buckets which will allow clients to selectively tune in to dirty buckets. That, and our entire broadcast structure, is explained below. 5.1 Broadcast Structure Broadcast structure refers to the specic broadcast organization that we assume in developing our strategies. It is important to understand this structure in order to properly appreciate our protocols. As mentioned before, we assume a (1,m) indexing strategy outlined in [Imielinski et al. 1997]. In this scheme, index information is provided at regular intervals in the broadcast. More specically, a complete index is inserted m times in a broadcast at regular intervals. Figure 3 illustrates our broadcast structure. A broadcast is a sequence of data blocks (containing data) and index segments (containing access information) as shown in gure 3A. Using the (1,m) data organization methodology, an index segment appears every ( m1 )th of the broadcast, i.e., there are m index segments. 12 A. Datta, D. VanderMeer, A. Celik, V. Kumar Clearly, each of the m data blocks is also of equal size. Each data block is composed of one or more data clusters as shown in gure 3B, where each data cluster consists of a collection of tuples of a single data item of interest. For example, assume the broadcast consists of stock information, and each tuple is a triple <stock id, price, market>. In such a scenario, the data items of interest would be represented by stock ids. Consider a particular stock id, e.g., IBM. All IBM records would comprise a data cluster. A data cluster may span more than one data block. Data clusters are composed of data buckets (gure 3C) which contain data records as well as some tuning information (denoted by the 5-tuple < X; Y; Z; N; EB > in the gure) explained below. We assume that each client has her own item of interest (e.g., clients are not interested in all stocks, but instead in specic ones). For the purposes of this study, we assume a client has a single data item of interest. As explained above, all records pertaining to this item appear in a specic data cluster which we refer to as the client's Data Cluster of Interest (DCI). Within the broadcast, the data clusters are organized in order of decreasing popularity based on PF, such that the most popular item will be broadcast rst, and the least popular item will be broadcast last. This helps to reduce the access times for popular items. An index segment is a series of buckets containing index tuples and some other special tuning information. We rst describe the index tuples. Each index tuple consists of a 4-tuple, < K; B; C; EC >, that not only informs the client precisely where the DCI appears in the broadcast, but also provides information about updates to the cluster since the previous broadcast. The structure of the index segment is shown in gure 3D. K, B, C and EC are dened below. K : the cluster's key value (e.g., for an IBM cluster, the key value is IBM). B : the id of the rst bucket of the data cluster. C : the oset from B to the rst dirty bucket (bucket where changes have occurred since the last broadcast) of the data cluster. If all buckets in the data cluster are clean, C takes a default value of -1. EC : the EDT of the cluster key, i.e., when the cluster is scheduled to be dropped from the broadcast. The dirty/clean information (i.e., B and C) are included to handle the rapidly changing data scenario explained earlier in this section.. We assume a B-tree structure for the index. Thus, clients must begin reading the index at the root in order to nd the pointers to their DCIs. As mentioned above and shown in gures 3C and D, all buckets, whether index or data, have a special tuple displayed as a 5-tuple < X; Y; Z; N; EB >. This information is provided to orient clients as they initially tune in to a broadcast. The X, Y, Z, N and EB terms are dened as follows. X : An oset to the rst bucket of the next nearest index segment. Y : An oset to the end of the broadcast, i.e., the start of the next broadcast. Z : Shows the bucket's type (data or index) and contains tuning information for items updated since the previous broadcast. It can hold one of four possible values: Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users Index Segment Data Block Index Segment Data Block Index Segment 13 Data Block Overview of Broadcast Structure Sequence of Index Segments and Data Blocks [A] Data Cluster Data Cluster Data Cluster Data Cluster . . . Data Bucket . . . X,Y,Z,N,EB Data Bucket X,Y,Z,N,EB Data Bucket X,Y,Z,N,EB X,Y,Z,N,EB Data Block Sequence of Data Clusters [B] Data Bucket . . . . . . Index Bucket . . . K j Bj Cj EC j Index Bucket X,Y,Z,N,EB X,Y,Z,N,EB Index Bucket . . . K i B i C i EC i K 2 B2 C2 EC2 K 1 B1 C1 EC1 X,Y,Z,N,EB Data Cluster Sequence of Buckets Containing Data Tuples with the Same Key Value [C] Index Segment Sequence of Buckets Containing Index Data [D] Fig. 3. Broadcast Structure Z = -2 indicates an index bucket. Z = 0 indicates a data bucket, and that the bucket is clean; i.e., unmodied since the previous broadcast. Z = i, where i is a positive integer, indicates a data bucket, and that the bucket is dirty; i.e., modied since the previous broadcast. Moreover, the actual i value, i.e., the positive integer, is an oset to the the next dirty bucket in the same data 14 A. Datta, D. VanderMeer, A. Celik, V. Kumar cluster. Z = -1 indicates a data bucket, and that it is the last dirty bucket of the data cluster. N = 0 indicates a data bucket, and that is the last data bucket of the data cluster. N = i indicates that this is not the last bucket of a DCI, and the oset to next data bucket of the same DCI is i. EB : The EDT of the data item in the bucket. Obviously, the EB value of every bucket in the same data cluster is going to be identical and equal to the EDT of the of the cluster key (e.g., in an IBM cluster, all EB values will be equal to the EDT of the IBM data item). While the reasons for including X , Y and EB are obvious, the utility of N and Z is not. The N values in each bucket are used as a means to identify the last bucket of a data cluster. In the indexing scheme used, there is a possibility that an index block is inserted within a set of data buckets of the same data item, i.e., the data cluster is fragmented across two or more data blocks. The N value gives the oset to the next data bucket of the same DCI, thus making it possible to listen to successive data buckets regardless of whether or not an index exists between them. The Z values are used to facilitate ecient retrieval in rapidly changing data scenarios explained earlier in this section. This will be illustrated in section 6.2. We x the unit of retrieval as a bucket, following the lead of [Imielinski et al. 1997], where a bucket is dened as the smallest logical unit of broadcast. All buckets (regardless of data or index) are of equal size. This allows for a direct mapping between time and buckets. In fact, time is measured in terms of buckets in this paper. This means that all the osets referred to in the preceding description are in terms of buckets. Finally we add a note regarding the notion of data clusters. A data cluster consists of a sequence of buckets containing data tuples pertaining to a specic key, such as a specic stock id. Moreover, we assume that selectivity, i.e., the number of buckets required to broadcast a specic data item, is xed. This is a reasonable assumption. Consider, for example, stock information expressed as a 3tuple <stock id, market, price>. Clearly for a particular key value (say IBM), the number of tuples is equal to the number of markets (e.g., New York Stock Exchange) where the stock is traded. This number is fairly constant, making the number of buckets for a particular cluster fairly stable. We also assume that specic buckets within a cluster have identical semantic content from one broadcast to another. For instance, the rst bucket within the IBM cluster will always contain the tuples related to the IBM prices in New York and Frankfurt (assuming there are 2 tuples per bucket). Thus while the individual prices may change, the semantic content of each bucket within a cluster is invariant. 6. SERVER AND CLIENT PROTOCOLS TO SUPPORT ADAPTIVE BROADCAST CONTENT AND EFFICIENT RETRIEVAL This section proposes adaptive broadcast protocols which seek an optimal balance of access time (quality of service or average query response time) and tuning time (energy consumption). We aim to develop cooperative strategies between clients Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 15 and servers for achieving such a balance. As mentioned earlier, periodicity is an important parameter in designing broadcast strategies. A periodic broadcast signies that the broadcast \size" (i.e., number of buckets) is xed. One can ascribe both advantages (e.g., predictability) as well as disadvantages (e.g., loss of exibility) to periodicity. To study such eects, we describe two sets of protocols below for the periodic and the aperiodic cases. We refer to the periodic protocol as the constant broadcast size (CBS) strategy, whereas the aperiodic broadcast protocol is termed a variable broadcast size (VBS) strategy. Finally, note that we support a mixed mode (this term is borrowed from [Imielinski et al. 1997]) retrieval policy, i.e., when a client arrives in a cell, she rst tunes in to the broadcast to see if her DCI is already there. If not, the client explicitly sends a request to the server through the uplink for that item. Thus items may be found readily in the broadcast, or may have to be placed \on demand". This policy has been deemed the most \general" policy in the literature. 6.1 Constant Broadcast Size Strategy We rst present the server protocol, i.e., the strategy used by the server in deciding upon the broadcast content. We then present the client protocol, i.e., how the client retrieves data from the broadcast. 6.1.1 CBS Server Protocol. In this strategy, broadcast size is limited, and the broadcast is periodic. Periodicity mandates an equal size for every broadcast (recall that we consider both size and time in terms of buckets). If there are too few requested items to ll the broadcast period, the broadcast will contain dead air. On the other hand, if there are more requested items than space in the broadcast, the server must prioritize requested items to decide which to include in the broadcast set. We propose that this prioritization mechanism simultaneously satisfy two properties: popularity consciousness and avoidance of chronic starvation. Popularity consciousness means that items that are requested more often should have a greater chance of being included in the broadcast than less popular items. Avoidance of chronic starvation means that if a client requests a \less popular" item, he should not be chronically starved, i.e., the item should appear in the broadcast at some point during that client's residence in the cell. At a minimum, our protocol attempts (but does not guarantee) to provide access to a requested data item at least once during a client's probable stay in the cell; that is, within RL time of the request. We propose a system of priority ranking of items based on two factors: a Popularity Factor (PF) and an Ignore Factor (IF). PF is computed as described in section 5. We now describe the IF feature of our model. Ignore Factor. Consider a strategy where the inclusion priority of items is based purely on popularity, i.e., items are included in the broadcast based on a HighestPF-First (HPF) strategy. Under such a strategy, assuming a large number of clients in a cell and a data set in which a small subset of items are very popular, clients requesting less popular items may be chronically starved. That is, items with a very low PF may be overwhelmed by the large number of requests for very popular items, and will not be broadcast while the requesting client is still in the cell. Recall that in our operating scenario, clients are highly mobile; thus, if a request is ignored 16 A. Datta, D. VanderMeer, A. Celik, V. Kumar for a long time, it is likely that the client will leave the cell without receiving service for that item throughout its entire residence in that cell. From a Quality Of Service standpoint, this is highly undesirable. At the very minimum, our goal is to attempt to service a client's request at least once during his stay in the cell in which the request was made. The purpose of Ignore Factor (IF) is to ensure that less-popular but long-neglected items get an opportunity to be included in the broadcast. We rst explain the notion of IF. Assume a data item, say X, was last reported in the broadcast ending at time T0 , after which it was dropped. At this point, the Ignore Factor of X , denoted by IFX , is dened to be 1, the base value for any item's Ignore Factor (the reason for choosing a minimum IF of 1, and not 0, will be explained later). Further assume that the rst request for X after time T0 arrived at time Treq . Subsequent to this point, whenever X is not reported in a broadcast, we say that X has been ignored. More specically, the ignore factor of X at any time Ti (assuming X has not been reported until Ti ), Ti > Treq , denoted by IFTXi , is given by: IFXTi = NumBroadcast(Ti ; Treq ) + 1, where NumBroadcast(Ti ; Treq ) denotes the number of broadcasts between Treq and Ti . Since we are considering a periodic broadcast scenario, this number is easy to compute. In particular, assuming a k j broadcast period of PB , NumBroadcast(Ti ; Treq ) = Ti ,PTBreq . In other words, IFXTi = Ti ,P Treq + 1 B (1) When talking about the IF of a particular data item from hereon, we will often drop the temporal superscript, i.e., Ti , unless the context warrants its explicit inclusion. Note that equation 1 is only valid under the constraint (Ti , Treq ) RL. To see this, consider a client C , who requested data item X at time Treq . Assume no other requests arrive for this data item. Then, at any time Ti following Treq , IFX is given by equation 1 above. However, following the denition of RL, client C would be expected to depart at Tdep = Treq + RL. Thus after Tdep , there is no point in reporting X any longer, as its requester may have left. Thus, after Tdep , unless other requests for X have arrived in the meantime, IFX should be reduced to 1 (the base value for any item's IF is 1; this is explained below). A corollary of this is that the following inequality holds, for any data item Xi , 1 IF RL + 1 Xi PB While the above discussion lays the foundation for the fundamental ideas behind the notion of ignore factor, we still have to describe a general procedure for computing ignore factor. To this end, let us consider a data item X , for which a stream of requests arrive. In this case, computation of IFX assumes greater complexity than discussed before. To see this, consider the case where two requests arrive for X , at times Treq1 and Treq2 respectively, where Treq2 > Treq1 . Assume X is not included in any broadcast until time Tdep1 = Treq1 + RL k Tdep1 , according to the j . At RL discussion in the two preceding paragraphs, IFX = PB + 1. However, as soon as the clock ticks past Tdep1 , i.e., after Tdep1 , the request that came in at Treq1 is considered void. Thus it should no longer determine IFX . Rather, after Tdep1 , IFX should be computed based on Treq2 . More specically, at all times during the inter- Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 17 val [Tdep1 , Tdep2j] (excluding the initial temporal instant), where Tdep2 = Treq2 + RL, k time , T current req 2 + 1. IFX is given by PB From the simple two-request example above, one important fact emerges: in order to maintain an accurate record of an item's IF, one must track every request that arrives for that item. With a large number of items and clients, such maintenance could be prohibitively expensive. In particular, the complexity of this process for a specic data item is of order O(N ), where N is the number of requests for that item. However, we have devised a mechanism by which the IFs of items can be maintained accurately at minimal cost. In particular, as will be clear later, our strategy records IF for a data item in constant time. We rst provide the intuition for this process and then describe it in detail. To see the basic idea behind our mechanism, consider gure 4, in which we show a B 1 T req1 T req2 B T 2 req3 T req4 B T dep1 3 T dep2 B4 Time Fig. 4. Computation of Ignore Factor timeline with particular points of interest. The points marked with the darker lines, i.e., B1 ; B2 ; B3 ; B4 , denote broadcast initiation epochs, i.e., instants at which successive broadcasts start. The ith request for data item X arrives at Treqi . In gure 4, we show four requests. In this example, we assume RL = 2 PB . Corresponding to the request points, the departure points are shown as Tdepi . According to the discussion above, we need to keep track of all four requests in order to accurately compute IFX at any point. However, note that IF values will only be used at the time the server decides on the next broadcast set, i.e., immediately preceding the broadcast epochs. Consider, as a specic example, the broadcast epoch B4 . At B4 , in gure 4, the request at Treq1 has expired. Note that the request at Treq2 has also expired. Moreover, we can safely say that at B4 , all requests that arrived prior to B2 would have expired. In general, at all decision points where the next broadcast set has to be decided, i.e., at successive broadcast j kth RL epochs, all requests that arrived prior to the PB broadcast preceding the next immediate broadcast are guaranteed to have expired. j Thus,kIFX needs to th broadcast be computed based on the rst request that arrived after the RL PB preceding the next immediate broadcast. This is the key intuition. For example, at B4 , IFX needs to be computed based on Treq3 . The signicance of this intuition is that the system need only keep track of the rst request after broadcast epochs and can ignore j allk successive requests until the next broadcast epoch. Moreover, only the last RL need to be considered. In other words, the system need PB epochs k j only maintain a list of RL PB requests. 18 A. Datta, D. VanderMeer, A. Celik, V. Kumar This may be achieved by maintaining an ordered list in memory. Elements j in kthis list will contain both the tid and timestamp of the request. The list has RL PB +1 k j elements, with tids 1 through RL PB +1, where the timestamp corresponding to the th i element yields the timestamp of the rst request for X after the ith previous broadcast epoch with respect to the next immediate broadcast. The list is updated upon the arrival of the rst request for a data item after each broadcast epoch. Such a list can be implemented as a circular Using such a structure, an item j queue. k can be accurately tracked in constant ( RL ) time. PB Priority computation using IF and PF: Having explained IF we now turn our attention to computing the inclusion priority for data items. We propose that an item's priority be computed based on the following expression Priority = IF ASF PF (2) where ASF is an Adaptive Scaling Factor described below. Equation 2 also explains why we set the base value of IF at 1 and not 0. It is easily seen that, at an IF of 0, the priority value (i.e., for popular items) is reduced to 0, which guarantees the item's exclusion from the broadcast. Equation 2 exploits the counteracting trend of IF and PF. Using the above expression, an item which has been requested by a large number of clients will have a large PF. It is thus likely to appear in many broadcasts, and will have a relatively low IF. On the other hand, an item with few requests will have a low PF. Based on its low PF, it is likely to be omitted from broadcasts, increasing its IF and therefore its chances of being included in a future broadcast set. ASF is an adaptive exponential weighting factor based on a nearest neighbor approach. Its purpose is to increase the likelihood that items which have been ignored for a long time will appear in the broadcast. PF and IF dier largely in scale; if ASF is relatively low, PF dominates the priority expression (limited by the number of clients in a cell). If, however, an item has been ignored for a long time, we would like IF to dominate. A larger ASF value will achieve this. In order to precisely dene ASF, let Wij denote the number of broadcasts client i waited for item j after requesting item j. Let Cj denote the set of clients who are waiting for j . Then, the average wait count (AWC) for j is given by, W AWCj = i2Cj ij jC j j Let the Desired Wait Count (DWC) denote a quality of service measure dened for the system. DWC denotes, on an average, the maximum number of broadcasts a client can reasonably be expected to wait before receiving service for her desired item. When computing the priority of an item, we compare AWC and DWC. If AWC > DWC , ASF is incremented by unity. If AWC = DWC , ASF remains unchanged. ASF is initialized to a base value for 1 for all data items, and reset to this base value each time the item is included in the broadcast. Having explained the underlying concepts, we are now prepared to describe the server protocol for constructing a broadcast. Prior to a broadcast epoch (the time at which a new broadcast is scheduled to begin), i.e., in its broadcast preparation Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 19 stage, the server prioritizes all items which have been requested, i.e., items with a PF > 0, and sorts the items in order of descending priority. It then adds items to the broadcast set until the broadcast is full. For all requested but excluded items, IF is adjusted. 6.1.2 CBS Client Protocols. We now describe client protocols designed to smartly retrieve data from the broadcast in cooperation with the server protocol dened above. When a client senses the need for a particular data item, he begins the retrieval process by tuning in to the broadcast at an arbitrary time and reading a bucket. We remind the reader that the data cluster in the broadcast that holds the item of a client's interest is referred to as the Data Cluster of Interest (DCI). The random initial probe in a continuous ow of broadcasts creates a large number of tuning possibilities. We identify seven mutually exclusive initial tuning possibilities. Note that because we assume a B-tree index structure, the client must start reading from the top of the index segment (i.e., the root). If she does not nd a pointer to her DCI (i.e., DCI is not in the current broadcast set), then she requests the item and tunes to the initial index of every succeeding broadcast until either the DCI is found in broadcast, or she departs from the cell. We discuss the cases below. We assume, for simplicity, that any probe into a broadcast always puts the client at the top of a bucket, i.e., we ignore the fact that the client may tune in at any point in a bucket. The error introduced by this, as easily seen, is very small (0:5 bucket on average). Case 1: Client initially tunes in to an index bucket, DCI exists in current broadcast, and the client has tuned in before DCI in the broadcast In this case the client tunes in to an index segment. If he has missed the root, he must tune out until the top of the next nearest index segment. This is eciently achieved by noting the X value (described in section 3) from the 3-tuple of tuning information provided with every bucket. Upon reading the index, the client nds that his DCI has not yet been broadcast, and tunes out until the beginning of his data cluster of interest. At the beginning of his DCI, he tunes back in and downloads it. Thus the retrieval protocol in this case is given in Protocol 1. Protocol 1 CASE 1 access top of next nearest index segment read index; wait until beginning of DCI; download DCI; Case 2: Client initially tunes in to an index bucket, DCI exists in current broadcast, and the client has tuned in after DCI in the broadcast As in the previous case, the client tunes in to the top of the next nearest index segment, and becomes aware that her DCI has already passed by comparing the current bucket id with the B value (explained in section 3) in the index tuple corresponding to her data item of interest. She also reads the E value in that index tuple to determine if her DCI is scheduled for removal from the broadcast set following the current broadcast. This is easily achieved by comparing the EDT (i.e., the E value) of the data item with the start time of the next broadcast, which is 20 A. Datta, D. VanderMeer, A. Celik, V. Kumar easily derivable from the Y value of any tuning tuple. If her DCI will appear in the next broadcast, she tunes out until the top of the next broadcast, reads index information and downloads her DCI as outlined in the previous case. If her DCI is scheduled to be dropped, she submits a request for it, and keeps tuning in to the top of every subsequent broadcast until a pointer to it appears in the index, or she leaves the cell. The pseudocode for Case 2, combined with that of Case 3, is shown below. Case 3: Client initially tunes in to an index bucket, DCI does not exist in current broadcast This case is very similar to the second subcase in case 2. After becoming aware of the non-existence of his DCI in current broadcast, by appropriate access into an index segment, the client submits a request for the data item, and keeps tuning in to the top of every subsequent broadcast until his DCI appears, or he leaves the cell. The protocol is as given in Protocol 2. Protocol 2 CASE 2 & 3 access top of an index segment; read index; if DCI missed or excluded from current broadcast wait until beginning of next broadcast; if DCI not in index then then submit request for DCI; while DCI not in index wait until beginning of next broadcast; read index; wait until beginning of DCI; download DCI; Case 4: Client initially tunes in to a data bucket, DCI exists in current broadcast, and the client has tuned in before DCI in the broadcast The procedure is exactly the same as in case 1. Case 5: Client initially tunes in to a data bucket, DCI exists in current broadcast, and the client has tuned in after DCI in the broadcast Same as case 2. Case 6: Client initially tunes in to a data bucket, DCI does not exist in current broadcast Same as case 3. Case 7: Client tunes in to a data bucket in the middle of the DCI in the current broadcast This is the most complex of all cases, and also the most important, as the same protocol would be used by clients who are interested in downloading their DCIs continually from successive broadcasts. In this case the client tunes in and becomes aware that her DCI is currently being broadcast. She immediately begins to download the data. Since she missed a portion of her DCI in the current broadcast, the remainder must be obtained from the next broadcast. At the end of the data cluster, having already determined whether her DCI is scheduled to be dropped from Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 21 the next broadcast (through methods outlined earlier), she tunes out and waits for the beginning of the next broadcast. If her DCI is going to be dropped, she sends a request on the uplink and keeps tuning in to successive broadcasts until her DCI is found in the index. The more interesting case occurs when she determines that her DCI is not going to be dropped, i.e., it is going to appear in the next broadcast. In this case, as discussed in the \rapidly changing" data description in section 5, she would like to download not only the portion of her DCI missed in the previous broadcast, but also the buckets that have been modied between the broadcast epochs. This is achieved as follows: The client downloads the portion of data missed in the earlier broadcast until the point at which she began downloading data in the previous broadcast. She then selectively tunes in and out to retrieve any updates (following the linked list of dirty buckets) to the data downloaded in the previous broadcast until the last dirty bucket in her data cluster of interest. Protocol 3 and Procedure 1 provide details regarding how the selective tuning is achieved. Protocol 3 CASE 7 access middle of DCI; download remaining buckets of DCI; wait until beginning of next broadcast; read index; if DCI in index; wait until beginning of DCI; execute procedure else DYNAMIC READ; submit request for DCI; while DCI not in index wait until beginning of next broadcast; read index; wait until beginning of DCI; download DCI; Case 8: Client tunes in to an index bucket in the middle of the DCI in the current broadcast Same as case 2. 6.2 Variable Broadcast Size Strategy Having discussed a periodic broadcasting strategy, we now turn our attention to an aperiodic broadcasting scenario. We call this strategy the variable broadcast size (VBS) strategy, for obvious reasons. Note that while the broadcast size varies across broadcasts, at the start of each individual broadcast the size is known; therefore, the start of the subsequent broadcast is known as well. However, the server has no knowledge beyond that, as it does not know what requests may arrive during the current broadcast. 22 A. Datta, D. VanderMeer, A. Celik, V. Kumar Procedure 1 DYNAMIC READ if DCI index.C next dirty > 0 then DCI index.C; /* find 1st dirty bucket */ ,1; next dirty not at point in DCI where began reading in previous broadcast download bucket; if current bucket.Z = ,1 then ,1; next dirty else if current bucket.Z next dirty if + /* in cluster */ else while /*index tuple C value of the DCI*/ current bucket.id /*entire data cluster is clean*/ > 0 then + (next dirty current bucket.Z); advance to next bucket; next dirty exit; = ,1 then /*the rest of DCI is clean*/ wait until next dirty; while (true) /*read all remaining if current bucket.Z = ,1 then fdownload bucket; exit;g else next dirty (next dirty + dirty buckets in DCI*/ current bucket.Z) download bucket; wait until next dirty; 6.2.1 VBS Server Protocol. The server protocol for VBS is much simpler than that for the constant size strategy. All requested items are included, i.e., all items with a PF greater than 0 are added to the broadcast set. The broadcast length changes as items are added and deleted. Items remain in the broadcast for RL units of time from their latest request, and are subsequently dropped from the broadcast set. Within the broadcast, items (i.e., DCIs) are ordered based on descending popularity. Since no item is ignored, there is no notion of ignore factor in VBS. 6.2.2 VBS Client Protocols. The client protocols in this strategy are similar to those of the CBS strategy. The main dierence between these strategies is in the client's response to nding that his DCI is not in the broadcast. Here, if his DCI is not in the broadcast, or if he has missed his DCI and the item will be dropped when the next broadcast is composed, the client requests the item and exits from the protocol. Since his DCI is guaranteed to be in the succeeding broadcast, he begins the retrieval process at the beginning of the next broadcast and nds his DCI in that broadcast. 7. AN APPROXIMATE ANALYSIS FOR THE CONSTANT AND VARIABLE BROADCAST SIZE STRATEGIES In this section, we are primarily interested in obtaining expressions for the expected access time (AT) and tuning time (TT) for both the constant broadcast size (CBS) Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 23 and the variable broadcast size (VBS) strategies. Both strategies, as the reader may recall from section 6, are presented as a set of mutually exclusive tuning scenarios. For each scenario, we compute AT and TT. We also compute the probability of encountering each case. Then, using elementary probability theory, expected access time (E [AT ]) and expected tuning time (E [TT ]) may be stated as: E [TT ] = X 8i P fScenario igTTi; E [AT ] = X 8i P fScenario igATi (3) In order to achieve analytical tractability, we make several assumptions in performing our analysis. Thus our results are approximate. Our assumptions are as follows: (a) clients read from the broadcast in units of buckets; (b) clients always tune in to the start (i.e. the top) of a bucket; (c) if the client tunes in to the middle of an index segment, she has missed the root of the B-tree, and must wait for the beginning of the next nearest index segment; (d) a cell contains N clients, and each client can request at most one DCI during a single broadcast period; and (e) there are two classes of data items. A hot item has a higher probability of being requested by the clients than a cold item. h percent of DCIs are hot and 1 , h percent are cold items. In addition, certain special assumptions are made for the CBS and VBS strategies. These assumptions are stated preceding each respective analysis section. The next two subsections list our results. Table 1 lists the notations used in our analysis. Notation Meaning d S n m B I TB b CCBS l L1 L2 ph pc ph;in pc;in Pin h c p0 k size of data to broadcast (in items) selectivity of a data item (in buckets) capacity of a bucket (tuples per bucket) number of index segments or data blocks per broadcast d size of data block (buckets per block) = nm 2 d size of index segment (in buckets) = Sn size of the broadcast in CBS = m(I + B ) d number of DCI = Sn capacity of broadcast in CBS (in DCI) size of an index segment (in buckets) = logn+1 I expected distance to DCI from the end of index segment (buckets) expected distance to next broadcast from end of index segment (buckets) probability that a client requests a hot item probability that a client requests a cold item probability that a hot item is in broadcast probability that a cold item is in broadcast probability that an item is in broadcast fraction of hot items in the d fraction of cold items in the d probability that a client does not request any DCI total number of DCIs in the database Table 1. Notations used in the Analysis 24 A. Datta, D. VanderMeer, A. Celik, V. Kumar Our analysis is probabilistic, i.e., we compute the probabilities of certain events. The events are summarized in Table 2. Event Meaning M M PM Di TIi TD TI PH PC DCI is missed completely DCI is downloaded completely (i.e., not missed at all) DCI is missed partially DCI is in the ith data block tune in to ith index segment rst tune in to a data block rst tune in to an index segment rst client requests a hot item client requests a cold item Table 2. Probabilistic Events 7.1 Derivation of Event Probabilities We start out by deriving probabilities for events that we consider in our analysis, outlined above in Table 2. Note that all of the aforementioned events are valid for both CBS as well as VBS strategies. TI g. Given that a client tunes in to an index segment, the Derivation of P fM; probability that the client does not miss DCI at all is denoted by P fM j TI g. P P fM j TI g = Pmi=1 P fTIi \ fDi+1 [ Di+2 [ Dm gg ,1 = Pm (4) im=1,1 P fTIi gP fDi+1 [ Di+2 [ Dm g ,1 = i=1 m1 mm,i = m2m The probability that the client tunes in to an index segment is denoted by P fTI g (the ratio of the total size of index segments over the total broadcast size). That is, I P fTI g = m(mI I +B ) = I +B = 2Data Sn Data 2Data Sn + nm = 2m2m+S (5) TI g = P fM j TI gP fTI g = m , 1 P fM; 2m + S (6) TDg = P fM j TDgP fTDg = m , 1 S P fM; 2m 2m + S (7) TDg. Derivation of P fM; P fM j TDg = 1 , P fM j TDg , P fPM j TDg ,1 , B,S , S = 1 , m2m mB mB 1 = m2, m Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 25 Derivation of P fM; TI g. 1 m+1 P fM j TI g = 1 , P fM j TI g = 1 , m2, m = 2m 1 2m P fM; TI g = P fM j TI gP fTI g = m2+ m 2m + S (8) Derivation of P fM; TDg. Given that a client tunes in to a data block, the probability that the client misses the DCI completely is denoted by P fM j TDg. As in equation (4), P fM j TDg = m X i=1 P fM j TD; Di gP fDi j TDg P fM j TD; Dig = P fM j j > i; TD; DigP fj > i j TD; Dig+ P fM j j = i; TD; DigP fj = i j TD; Dig+ P fM j j < i; TD; DigP fj < i j TD; Dig where j is the index of the data block to which a client tunes in. Thus, 1 P fj > i j TD; Di g = m2, m P fj = i j TD; Di g = m1 1 P fj < i j TD; Di g = m2, m Denitely, P fM j j > i; TD; Di g = 1 P fM j j = i; TD; Di g = B B, S P fPM j j = i; TD; Di g = BS 1 B,S 1 P fM j TD; Di g = m2, m + B m P fM j TDg = = m X P fM j TD; Di gP fDi g i=1 m X i=1 1 B,S 1 1 ( m2, m + B m)m 1 B ,S = m2, m + mB B , 2S S P fM; TDg = P fM j TDgP fTDg = mB +2mB 2m + S (9) 26 A. Datta, D. VanderMeer, A. Celik, V. Kumar Derivation of P fPM; TDg. P fPM j TD; Dig = P fPM j j = i; TD; DigP fj = i j TD; Dig S = BS m1 = mB where j is the index of the data block to which a client tunes in. m X S 1 = S P fPM j TDg = mB m mB i=1 S S P fPM; TDg = P fPM j TDgP fTDg = mB 2m + S (10) Table 3 summarizes the probabilities derived above. Note that in the subsequent proofs, we shall repeatedly use the probabilities from table 3. Events Probabilities TI g 2mm,+1S P fM; TDg m2m,1 2mS+S P fM; P fM; TI g 2mm+1 +S ,S ) S P fM; TDg ( m2m,1 + BmB 2m+S S S P fPM g mB 2m+S Table 3. Probabilities of Events 7.2 Analysis of Constant Broadcast Size Strategy We rst present the results of our analysis of the CBS strategy. For each Scenario i, we present the probability of encountering it ( P fScenario ig) and expressions for Tuning Time and Access Time for that scenario, i.e., TTi and ATi . Our results are presented as a series of lemmas. While there are seven cases presented in section 6, there exists a substantial amount of commonality between the cases and they may be aggregated. In the ensuing analysis, we have treated the seven cases through six scenarios, termed Scenario 1, Scenario 2, : : :, Scenario 6 respectively. Each of these scenarios map to particular cases described in section 6. Specically, Scenario 1 (presented in Lemma 3) maps to Cases 1 and 4. Scenario 2 (presented in Lemma 4) and Scenario 3 (presented in Lemma 5) map to Cases 2 and 5 respectively, while Scenario 4 (presented in Lemma 6) and Scenario 5 (presented in Lemma 7) map to subcases of Case 7. Scenario 6 (presented in Lemma 8) maps to Cases 3 and 6. We make two special assumptions in our analysis of the CBS strategy: (a) We assume that items are priority ranked based purely on popularity, i.e. the notion of Ignore Factor (IF) is not considered. This was done because it was very dicult to include IF in a mathematically meaningful way in a single analysis. We do include IF in our simulation experiments. (b) We assume that when a request is made, the requesting client will receive its DCI before exiting the cell. Further, for simplicity, we assume that P fDCI 2 BRi g, i.e., the probability of a requested DCI is in the ith broadcast, is independent from P fDCI 2 BRi+1 g, thus for a DCI, pin remains the same throughout all broadcasts. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 27 Definition 1. Expected Waiting Time, denoted by E [Tw ], is the average amount of time (in number of buckets) that elapses between the submission of a request for a DCI by a client and the start of downloading of that DCI from a subsequent broadcast by the client. Lemma 1. Expected Waiting Time (E [Tw ]) is expressed as, E [Tw ] = 1 X j =0 ( 21 + j )m(I + B )(ph (1 , ph;in )j ph;in + pc (1 , pc;in )j pc;in ) (11) Proof: See Appendix. Definition 2. Pos(i) is a function that returns the ordinal position of an element i in an ordered list A, given that i is an element of A. Lemma 2. Under the Constant Broadcast Size strategy, the probability that a DCI exists in the broadcast that the client initially tunes in to, denoted by Pin , is, Pin = b b (1 + + b )! Y i p + X (1 + + b )! Y pi p p Qb Qb i h i c i=1 i ! i=1 i=1 i ! i=1 fBh g fBc g X (12) where Bh = f(1 ; 2 ; ; b ) j i 2 H; (Pos(i ) 2 [1; CCBS ])g, Bc = f(1 ; 2 ; ; b ) j i 2 C; (Pos(i ) 2 [1; CCBS ])g, i is the number of requests to item i. H and C are hot and cold item sets, respectively. Proof: See Appendix. Lemma 3. Let Scenario 1 denote the scenario where DCI is in the current broadcast and the client did not miss it. Under Scenario 1, the following expressions hold. 1 S (13) P fScenario 1g = Pin ( 2mm ,+ 1S + m2, m 2m + S ) TT1 = 1 + l + S (14) E [AT1 jScenario1] = Pin 2mm ,+ 1S ( I2 + B + I + L1 + S + ( B2 + I + L1 + S ) 2Sm ) (15) Proof: See Appendix. Lemma 4. Let Scenario 2 denote the scenario where the DCI is in the current broadcast but the client completely missed it. Further, the DCI appears in the next broadcast. Under Scenario 2, the following expressions hold. ,1 + B ,S) S ) P fScenario 2g = Pin2 ( 2mm ,+ 1S + ( m2m (16) mB 2m + S TT2 = 1 + 2l + S (17) E [AT2 jScenario2] = Pin2 2m1+ S ((m , 1)( 32I + B + L2 + m(I 2+ B ) + S ) + (18) m(I + B ) 1 B,S B (19) ( m2, m + mB )S ( 2 + I + L2 + 2 + S )) Proof: See Appendix. 28 A. Datta, D. VanderMeer, A. Celik, V. Kumar Lemma 5. Let Scenario 3 denote the scenario where the DCI is in the current broadcast but the client completely missed it. Furthermore, the DCI will not appear in the next broadcast, and thus the client must submit a request to the server. Under Scenario 3 the following expressions hold, assuming that the DCI will appear in a broadcast before the client exits the cell. 1 B,S S P fScenario 3g = Pin (1 , Pin )( 2mm ,+ 1S + ( m2, m + mB ) 2m + S ) (20) t] (21) TT3 = 1 + mE(I[W + B) l + S t] AT3 = m(I + B )( mE(I[W (22) + B ) + 1) + S Proof: See Appendix. Lemma 6. Let Scenario 4 denote the scenario where the client partially missed the DCI in the current broadcast, and the DCI reappears in the next broadcast. Under Scenario 4, the following expressions hold. Proof: See Appendix. S S P fScenario 4g = Pin2 mB 2m + S S TT4 = 2 + l + S2 AT4 = m(I + B ) (23) (24) (25) Lemma 7. Let Scenario 5 denote the scenario where the client partially missed the DCI in the current broadcast, but the DCI does not appear in the next broadcast (it appears before the client exits the cell). Under Scenario 5, the following expressions hold. S S (1 , P ) P fScenario 5g = Pin mB in 2m + S t] TT5 = S2 + l + mE(I[W + B) l + S t] AT5 = m(I + B )( mE(I[W + B ) + 1) + S (26) (27) (28) Proof: See Appendix. Lemma 8. Let Scenario 6 denote the scenario where the DCI is not in the current broadcast but appears before the client exits the cell. Under Scenario 6, the following expressions hold. P fScenario 6g = (1 , Pin ) (29) t] TT6 = 1 + l + mE(I[W (30) + B) l + S t ] + 1) + S AT6 = m(I + B )( mE(I[W (31) + B) Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 29 7.3 Analysis of Variable Broadcast Size Strategy In this section, we present our analysis of the variable broadcast size strategy in a series of scenarios. As in the CBS analysis presented in the previous section, these scenarios, described through a series of lemmas, are aggregations of the cases presented in section 6. Specically, Scenario 7 (Lemma 13) refers to Case 1, Scenario 8 (Lemma 14) to Case 4, Scenario 9 (Lemma 15) to Case 2, Scenario 10 (Lemma 16) to Case 5, Scenario 11 (Lemma 17) to Case 7. Case 3 and Case 6 are aggregated in Scenario 12 (Lemma 18). Note that the E [TT ] and E [AT ] are calculated by multiplying the probability terms in Scenarios 7 through 11 by P fDCI 2 BRg. Further note that in the abovementioned lemmas we do not derive the AT and TT expressions, as in the CBS case. Recall that we provided a sample derivation of AT and TT in Lemma 3. We remind the reader at this point that the VBS strategy server protocol will always place requested DCIs in the subsequent broadcast. Therefore, if a client's DCI is not in the current broadcast, she is guaranteed to nd it in the next broadcast. Lemma 9. Let L1 denote the expected distance from end of index to the DCI, given that the DCI is not missed. L1 is expressed as, 1)(m , 2) (32) L1 = E fDistance to DCIjM g = B (m4m, 1) + (I + B )(m6, m Proof: See Appendix. Lemma 10. Let L2 denote the expected distance from end of index to the next broadcast, given that the DCI is missed. L2 is expressed as, 2 L2 = E fDistance to next broadcastjM g = 21m ((m , 1)B + (I + B ) 2m ,66m + 3 ) (33) Proof: See Appendix. Lemma 11. Let P fDCI 2 BRg denote the probability that the DCI is in the current broadcast under VBS. P fDCI 2 BRg is given by, p0 )ph )N )p + (1 , (1 , (1,p0 )pc )N )p P fDCI 2 BRg = (1 , (1 , (1,bh h c b(1,h) (34) Proof: See Appendix. Lemma 12. Let BS denote the size of the broadcast under VBS. The expectation of BS is dened as, E fBLg = d X j =0 jS P fBL = jS g (35) where, P fBL = jS g = P fj dierent DCI from N clientg j X bh b(1 , h) X QN ! ( (1 , p0 )ph )l+Piq=1 q ( (1 , p0 )pc )j,i+Pjq=i+1 q = j,i q ! bh b(1 , h) i=0 i A 30 A. Datta, D. VanderMeer, A. Celik, V. Kumar and, A = (1 ; ; j ); 8q 0; q 2 [1; j ]; j X q=1 q = N , j; i = number of requests to item i Lemma 13. Let Scenario 7 denote the scenario where the client tunes in to an index bucket but does not miss the DCI in the current broadcast. Under Scenario 7, the following expressions hold. P fScenario 7g = 2mm ,+ 1S (36) TT7 = 1 + l + S; (37) I (38) AT7 = 2 + B + I + L1 + S Proof: See Appendix. Lemma 14. Let Scenario 8 denote the scenario where the client tunes in to a data bucket before the data block containing the DCI in the current broadcast. Under Scenario 8, the following expressions hold. 1 S P fScenario 8g = m2, (39) m 2m + S TT8 = 1 + l + S; (40) (41) AT8 = B2 + I + L1 + S Proof: See Appendix. Lemma 15. Let Scenario 9 denote the scenario where the client tunes in to an index bucket but missed the DCI in the current broadcast. Under Scenario 9 the following expressions hold. P fScenario 9g = 2mm ++ 1S (42) TT9 = 1 + 2l + S (43) I m ( I + B ) AT9 = 2 + B + I + L2 + 2 + S (44) Proof: See Appendix. Lemma 16. Let Scenario 10 denote the scenario where the client tunes in to a data bucket, but missed the DCI in the current broadcast. Under Scenario 10, the following expressions hold. B , 2S S P fScenario 10g = mB + (45) 2mB 2m + S TT10 = 1 + 2l + S (46) AT10 = B2 + I + L2 + m(I 2+ B ) + S (47) Proof: See Appendix. Lemma 17. Let Scenario 11 denote the scenario where the client tunes in to a data bucket in the middle of the DCI in the current broadcast. Under Scenario 11, Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 31 the following expressions hold. S S P fScenario 11g = mB 2m + S S TT11 = 2 + l + S2 Proof: See Appendix. AT11 = m(I + B ) + S4 (48) (49) (50) Lemma 18. Let Scenario 12 denote the scenario where the client tunes in to either a data bucket or an index bucket. Given that DCI does not exist in the current broadcast, under Scenario 12 the following expressions hold. P fScenario 12g = 1 , ((1 , (1 , (1 ,bhp0 )ph )N )ph + (1 , (1 , (1b(1, ,p0h)p)c )N )(51) pc ) TT12 = 1 + 2l + S (52) AT12 = m(I + B ) + S (53) Proof: See Appendix. 8. DISCUSSION Based on our analyses of both the constant broadcast size and the variable broadcast size strategies, we plotted the tuning time (TT) and access time (AT) curves for various parameters. In this section, we illustrate our strategies in a small scale example, primarily due to the computational costs involved. Although the database used in the examples consists of 200 buckets, we believe it is large enough to capture the essence of the strategies. Our parameters are: b = 10, S = 20, n = 1, ph = 0:8, h = 0:2, c = 0:8, m = 4 for VBS, and CCBS = 4 for CBS, initially. 8.1 Discussion on AT and TT curves Here, in gures 5A and B, we illustrate the changes in the tuning and access times of the two strategies that we proposed. In both gures, we vary the number of clients in the cell, keeping other parameters constant. Tuning Time (TT) curves:. We rst turn our attention to the variable broadcast size (VBS) strategy. VBS strategy includes every requested item in the broadcast. As number of clients increases, the set of requested items grows as well. As a result, the size of the index segments in VBS increases. Clearly, TT is proportional to the size of the index. Thus in VBS, TT increases as the number of clients increases. This explains the upward trend of the TT curve for VBS in gure 5A. Another interesting feature of the TT curve for VBS is that the slope decreases along the curve. It is easily seen that the slope of this curve is proportional to the rate of growth of the requested item set, i.e., the faster the growth of the demand set, the faster the growth of the tuning time. When the cell is sparsely populated, each additional user has a high probability of asking for new items. However, under heavier population loads, new arrivals tend to ask for items already in the request set, which slows down the growth of this curve. We call such requests redundant requests. 32 A. Datta, D. VanderMeer, A. Celik, V. Kumar 26 24 Tuning Time (buckets) 22 20 18 16 14 CBS VBS 12 10 10 12 14 16 Number of Clients 18 20 80 Access Time (buckets) 70 60 50 40 CBS VBS 30 20 10 12 14 16 Number of Clients 18 20 Fig. 5. Experimental Results: [A] TT for VBS and CBS; [B] AT for VBS and CBS We now turn our attention to the second curve in gure 5A; the curve for the constant broadcast size strategy (CBS). In the CBS strategy, index size is limited because the broadcast size is limited. As the number of clients increases, the number of redundant requests, particularly for hot items, increases. Therefore, the probability that a requested item is in the broadcast (pb ) increases, lowering TT for larger client sets. For example, for N = 10, pb is 0:830155 whereas for N = 12, it increases to 0:840821. Here, in the case where N = 12, clients have a higher probability of downloading the DCI in the current broadcast than in the case where N = 10. Clients who download the DCI in the current broadcast need not submit a request and tune to the index of successive broadcasts, thus saving tuning time. As the number of clients increases, the number of non-redundant requests also increases, which slows the decrease in TT as illustrated in gure 5A. Having discussed the individual TT curves, we now focus on explaining the relative performance of the two strategies. With the current set of parameters,VBS performs slightly better than CBS for low to moderate loads in the system. How- Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 33 ever, as the load, i.e., the number of clients, increases, CBS performs better. Intuitively, this makes sense. When the number of clients grows very large, VBS will include all their requests, which will include many cold items as well as hot items. This causes indices to grow to a point where TT becomes very large. By virtue of limiting the size of the index, CBS does not suer from this problem. Access Time (AT) curves:. In the variable broadcast size strategy, AT responds to an increasing number of clients in the same way as TT. It increases as the number of clients increases, but the increase slows as redundancy in the request set increases. This can be seen readily in gure 5B. Initially in the constant broadcast size strategy, the server does not receive enough requests to ll the broadcast, causing dead air in the broadcast. Clients are forced to submit requests for their DCIs and wait for the next broadcast. This contributes to the relatively high initial AT. As the broadcast starts to ll up, the eect of the increasing number of clients on AT decreases. Hot items are likely to be requested by many clients, and are therefore likely to be in the current broadcast when a client tunes in. Figure 5B shows the curve for the CBS. The two curves in gure 5B behave dierently at dierent load levels. The VBS curve performs better than the VBS when the system load is low. For moderate to high loads in the system, VBS AT increases sharply while CBS AT remains fairly constant. In this scenario, CBS provides a lower access times for clients than VBS. Overall, it can be deduced that when the system load is low, VBS outperforms CBS for both TT and AT. For moderate loads, there is a trade-o between the two strategies: VBS performs better for TT and CBS performs better in terms of AT. For high loads in the system, CBS dominates VBS, providing lower TT and AT. 8.2 Changes in TT and AT for VBS as the number of indices in broadcast varies The number of index segments in the broadcast (m) is the most important parameter for the VBS because the broadcast size is not limited. Increasing m provides a larger number of opportunities for clients to read the index, which tends to lower tuning times, but increases the overall size of the broadcast, which increases access times. Therefore, we plotted the TT and AT curves for comparing the eect of various m values. Figures 6A and B illustrate the changes in TT and AT curves, respectively, as m changes. TT curves:. Figure 6A shows that the change m does not have a signicant eect on the TT. Since the data in the broadcast remains constant at a given client size, the index size remains constant as well. Slight dierences in the curves are mainly due to round-up errors, and TT curves converge to a single curve. AT curves:. At a given number of clients, even if the m value is dierent, the amount of data in the broadcast is the same. However, for dierent values of m, the broadcast overall size changes since we change the number of index segments. Therefore, if m is increased, the size of the broadcast grows, thus increasing the AT. 34 A. Datta, D. VanderMeer, A. Celik, V. Kumar 26 24 Tuning Time (buckets) 22 20 18 16 14 VBS with m = 2.0 VBS with m = 3.0 VBS with m = 4.0 12 10 10 12 14 16 Number of Clients [A] 18 20 80 Access Time (buckets) 70 60 50 40 VBS with m = 2.0 VBS with m = 3.0 VBS with m = 4.0 30 20 10 12 14 16 Number of Clients 18 20 [B] Fig. 6. Experimental Results: [A] TT for VBS with various m values; [B] AT for VBS with various m values 8.3 Changes in TT and AT for CBS as broadcast capacity varies The crucial parameter in the Constant Broadcast Size strategy is the capacity of the broadcast denoted by CCBS . Limiting the capacity of the broadcast and favoring more popular items for inclusion reduces the access and tuning times for hot items, at the expense of colder items. The basic premise of the strategy is that with a \good" value of CCBS , both access and tuning times may be reduced. In gures 7A and B, we illustrate the eects of changing the broadcast capacity (CCBS ) on TT and AT for the constant broadcast size strategy. Note that in the curves the parameter Capacity represents CCBS . TT curves:. Here, we turn our attention to gure 7A, and observe the eects of changing the broadcast capacity on TT. For CCBS = 2, pb increases as the number of clients in the cell increases. However, a broadcast capacity of 2 DCIs is extremely small. In this scenario, hot items are continually included in the broadcast, causing the exclusion of cold items. Clients Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 35 50 Tuning Time (buckets) 40 30 20 CBS with Capacity = 2 CBS with Capacity = 4 CBS with Capacity = 6 CBS with Capacity = 8 10 0 10 12 14 16 Number of Clients 18 20 [A] 200 Access Time (buckets) 150 100 50 CBS with Capacity = 2 CBS with Capacity = 4 CBS with Capacity = 6 CBS with Capacity = 8 0 10 12 14 16 Number of Clients 18 20 [B] Fig. 7. Experimental Results: [A] TT for CBS with various CCBS values; [B] AT for CBS with various CCBS values requesting cold items are forced to tune in to each successive broadcast until they exit the cell or the request becomes invalid. This results in a considerable increase in TT. For CCBS = 4, pb is higher than in the case where CCBS = 2. Both hot and cold items have a higher chance to be included in the broadcast as the capacity of the broadcast grows. Therefore, clients are less starved, and the Expected Waiting Time is lower than in the case where CCBS = 2. The number of redundant requests, particularly for hot items, increases as the number of clients increases, causing a slight decrease in the TT curve. The curve for CCBS = 8 is similar to the curve for CCBS = 4. Its location above the CCBS = 4 curve may seem counterintuitive at rst; we would expect the tuning time to decrease even more by increasing the size of the broadcast. However, this is an excellent example of unnecessarily high broadcast size. The server does not receive enough requests to ll the broadcast capacity, and broadcasts dead air. 36 A. Datta, D. VanderMeer, A. Celik, V. Kumar AT curves:. Expected Waiting Time is the most signicant part of the AT because clients who have missed their DCIs, or whose DCIs are not included in the current broadcast, must wait, at a minimum, until the next broadcast to retrieve their DCIs. This waiting time is not considered part of TT. The curves for AT in gure 7B are similar to those shown for TT in gure 7A. It can be concluded that, in order for CBS to perform at its best, the broadcast size parameter should be adjusted to the load in the system. This result emphasizes the importance of adaptive capability of broadcasting strategies. 9. SIMULATION MODEL While the results of the previous section were informative, our analytical study is subject to certain limitations, e.g., we omitted consideration of Ignore Factor in order to make our analysis mathematically tractable. In order to overcome these limitations, we performed an empirical performance evaluation by means of an extensive simulation study. Our simulation programs were written in SIMPACK [Fishwick 1995], a C/C++ based simulation toolkit. Our simulation model contains three major components: (1) A database component (2) A client component (3) A server component 9.1 Database and Broadcast Model Our database consists of DatabaseSize data objects (items). Information about each item is represented by a certain number of xed size records. As explained earlier in the paper, the records corresponding to a given data item form a data cluster in the broadcast. This structure was motivated by real-life examples. For example, in the stock market domain, data items would correspond to particular ticker symbols while records would be the information (such as price, volume, closing price of previous day etc.) corresponding to these ticker symbols in various markets. Typically, stocks are traded in a xed number of markets (e.g., Company X would be traded in 50 markets around the world); thus, given a particular data item there would be a xed number of records, and a specic record would always convey the same basic information (e.g., the third record in the IBM cluster corresponds to IBM stock information in the Frankfurt stock market). The number of records corresponding to data items are generated from a truncated normal distribution within the range [MinRecordsPerItem, MaxRecordsPerItem] with a mean of MeanRecordsPerItem and standard deviation of StdRecordsPerItem. Note that this determination (i.e., number of records per item) is done only once in the simulation and remains xed thereafter. The sizes of data and index records in bytes are denoted by DataRecordSize and IndexRecordSize respectively. Our database model assumes locality of reference. A fraction of items, dened by the parameter PercentHot, are designated as hot items. These items will be more heavily requested by clients than cold items. In our model a client has a specic data item of interest. When a client arrives at the cell, she requests a hot item PercentChooseHot fraction of the time. Once it is decided whether a client requests Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 37 a hot or a cold item, the specic item is picked randomly from the corresponding set. The broadcast is modeled as shown in gure 3 and is characterized as a number of xed size buckets, of capacity BucketSize bytes. Each bucket contains a header of tuning information. Each index bucket contains a header of IndexHeaderSize, while data buckets . Thus, a data cluster of D l contain headers of DataHeaderSize m D DataRecordSize records ts into BucketSize,DataHeaderSize buckets. For indexing the broadcast, we use a (1,m) indexing scheme as described in [Imielinski et al. 1997], andjorganize the index tuples in akB-tree format. Clearly, ,IndexHeaderSize index records. each index bucket contains BucketSize IndexRecordSize In the case of the constant size strategy, the broadcast size is characterized as BroadcastSize seconds. In case of the variable size strategy, the size of the broadcast is determined by the number of data items requested. The database is updated asynchronously. We are only concerned with what fraction of buckets, per data cluster, have their contents changed (i.e., are dirtied) per unit time. In other words, we are concerned with associating with each data item Di a value Xi that models how the contents of Di change. This is modeled using a stochastic variable which is drawn from a truncated normal distribution in the range [MinChangeFrac, MaxChangeFrac], with a mean of MeanChangeFrac and a standard deviation of StdChangeFrac. Note that if more than one broadcast passes between two successive inclusions of a data item, we are not concerned with modeling what fraction of the buckets has changed as our retrieval protocols will assume that all data has changed and will consequently download the entire data cluster. 9.2 Client Model Clients arrive as a poisson process with rate ArrivRate. We vary ArrivRate to vary the load in the system. The length of time a client remains in the cell is normally distributed (left truncatedly), with a mean of MeanResidenceLatency, standard deviation StdResidenceLatency and a minimum residence of MinResdidenceLatency. Note that during a client's stay, he is continually consuming valuable battery resources (at varying rates as pointed out later in this section). In order to show the energy-conserving eects of our algorithms, we track the energy consumed by clients in the course of their residence in the cell. To track the energy consumption of clients we have constructed a detailed cost model. Each mobile client machine is equipped with various energy consuming components, some of which are more signicant than others. Our client model assumes four major components which consume energy at varying rates. These four components are: the disk drive, the CPU, the display unit, and the mobile data card (MDC). We make two simplifying assumptions: (1) Clients remain powered up for their entire residence in the cell. Therefore, we do not consider the cost of powering up the mobile unit. (2) Other components of the mobile unit, e.g., random access memory, consume power at a constant rate, and are therefore of little interest. First, we describe each of the four components of interest, their possible states, 38 A. Datta, D. VanderMeer, A. Celik, V. Kumar energy consumption in those states, and events that trigger state changes. We calculate energy consumption based on the equation ENERGY (Joules) = POWER (Watts) x TIME (seconds). Table 4 shows typical energy consumption rates for these components and states, based on existing product specications and literature [Inc. 1997; Seagate Technology 1997]: Component State CPU CPU Disk Disk Disk Display Display MDC MDC MDC Active Sleep Write Idle Sleep Active O Transmit Receive Sleep Rate of Consumption 0:25 Watts 0:00005 Watts 0:95 Watts 0:65 Watts 0:015 Watts 2:5 Watts 0 Watts 0:4 , 0:6 Watts 0:2 Watts 0:1 Watts Table 4. Component States and Power Consumption Disk Drive. In this simulation we consider diskful clients. In the diskful state we assume data are cached on disk and we are interested in disk operations and corresponding energy usage. The disk drive of the mobile unit exists in one of three states: read/write (RW), idle (I), and sleep (S). In the RW mode, the head is actively reading or writing data on the disk. In the I mode, the disk is spinning, but the head is not actively reading or writing. In the S mode, the disk is not spinning and consumes energy at a very low rate. We assume that the disk drive is in the I state upon the client's arrival in the cell. It enters RW mode at the start of a data download from broadcast { while data is being downloaded from the broadcast, it is simultaneously being written to the disk resident cache. Once the data write is over, the disk does not enter the S state immediately (in keeping with existing disk models) [Li et al. 1993]. Rather, it reverts to the I state for a short period (this is usually a tunable parameter in modern disk drives and varies between 30 seconds and 5 minutes). In our simulation, we model this phenomenon with the parameter IdlePeriod. At the end of IdlePeriod time, if no additional disk I/O has occurred, the disk stops spinning, and enters the S state. If the disk is in the S state as the time approaches for a broadcast download, the disk enters the I state in preparation for the broadcast download. Waking the disk from the S mode is expensive in that it requires a signicant expenditure of time and energy. Spinning up the disk takes SpinUp seconds at a rate of consumption of 3.0 Watts. We note here that disk I/O may occur at slower rate than the broadcast transmission rate. Therefore, writing to disk may end at a later time than the broadcast read ends, even though the broadcast read and the disk write may have commenced Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 39 (approximately) simultaneously.1 CPU. The mobile unit's CPU can exist in one of two states: active (A), and sleep (S). The CPU is in the A mode when any operation is taking place in the mobile unit, i.e., any other component is in the active state. Otherwise, the CPU slips into S mode. Display. The display component can exist in one of two states: on (N), and o (F). In the N state, the display unit is actively displaying data on the screen. In F mode, the unit displays no data, and consumes no power. We assume that the display is in the F state upon the client's arrival in the cell. The display enters the N state while the mobile unit is downloading the DCI from the broadcast, and returns to the F state at the completion of the data download. Mobile Data Card. The mobile data card (MDC) has three possible states: transmit (T), receive (R), and sleep (S). In the T mode, the unit is transmitting a message to the server. In the R mode, the unit is reading from the broadcast. In the S mode, the unit is neither transmitting nor receiving. We assume that the mobile data card is in the R state upon the client's arrival in the cell for the initial probe. The MDC enters the T state to send requests to the server, and returns immediately to the S state upon ending transmission. It enters the R state for index reads and data reads, and returns to the S state immediately upon completion of the read. We assume that each mobile unit is equipped with a wireless modem for accessing bandwidth on air. We assume a data rate of 19.2 kbps, the rate at which clients both receive data from the broadcast and transmit requests to the server. We base our model of the MDC on Motorola's wireless PCMCIA modem [Inc. 1997]. 9.3 Server component To model the server, we simply implemented the broadcast generation algorithms (i.e., the server side protocols) described earlier. We are not interested in the server side cost model. 10. PERFORMANCE ANALYSIS In this section, we present the results of the simulation study of our protocols described in section 6. To see how our results fare in relation to existing protocols, we have also compared our strategies against the (1,m) protocol [Imielinski et al. 1997] proposed by Imielinski, Vishwanathan,and Badrinath. Recall that we borrowed the broadcast organization structure from this strategy { it thus seemed appropriate that we compare our results against that of the (1,m) protocol (which we shall refer to as the IVB strategy from this point on). Note in this connection that this was not an easy task, as the IVB strategy neither decides broadcast content nor determines client retrieval policies. Consequently, we had to impart to the IVB strategies similar functionalities as our protocols. Specically, we had to add both server-side as well as client side strategies to the IVB protocol. To make the comparison fair we use the same server side protocol in the IVB strategy as we use in 1 There is a very small delay introduced in relaying the information read by the mobile interface unit to the disk writing unit. 40 A. Datta, D. VanderMeer, A. Celik, V. Kumar our CBS strategy (recall that both are periodic strategies). In other words, both the CBS as well as the IVB protocols end up with a virtually identical broadcast content and organization. The only dierence is that IVB index and data bucket headers dier from the CBS bucket headers. In the IVB strategy the bucket headers include the tuning information outlined in [Imielinski et al. 1997], whereas our protocol includes tuning information discussed in section 5. We also had to decide on what kind of client side strategies to use for IVB. Again, to make the comparison fair, we decided on using our client side strategies with one dierence { in IVB in successive broadcasts entire data clusters are downloaded, whereas in our protocol, only the dirty buckets are downloaded (as outlined in section 6.1.2). In summary therefore, the dierences in performance (if any) between IVB and CBS would arise as a result of two factors: |The \smart" tuning information we use. |The selective downloading that we propose. Note that both the above characteristics let our protocols selectively tune in and out of the broadcast. Thus, by comparing the our CBS strategy with the IVB strategy, we will determine whether this selective probing of the broadcast results in any energy conservation eciencies. We hasten to add that this comparison will not yield any insights into the eciencies attained as a result of our server side broadcast generation algorithms { recall that both CBS and IVB strategies use the same algorithms to generate broadcast content. Due to the lack of other dynamic broadcast content generation mechanisms, we are unable to provide a comparative study of the benets of our algorithms. However, we do compare VBS and CBS. Recall that VBS, unlike CBS, does not have an explicit content determination strategy { everything that is requested is included in the broadcast. Thus, this comparison does provide some insights into the pros and cons of our server side algorithms. 10.1 Performance Metrics Our primary goal in this performance study is to show the ecient and energyconserving properties of our protocols. We show these properties through two performance metrics: Access Time (AT) and Normalized Energy Expenditure (NEE). Access Time (AT):. AT measures the eciency of our protocols. We dene AT in the same way that latency is dened in [Imielinski et al. 1997], i.e., it is the amount of time that elapses between the point at which the client begins to look for his DCI in the broadcast and the point at which the client nishes his next download of that DCI. For a client who has just arrived in the cell, and has not yet downloaded his DCI, AT refers to the time between his arrival and the end of his rst DCI download. After the initial read, we assume that the client would like to retrieve his DCI from every succeeding broadcast, and therefore immediately begins the process of looking for his DCI in the next broadcast. Thus, for a client who has completed his initial read, AT refers to the period between the end of one DCI download and the end of his next DCI download. Normalized Energy Expenditure (NEE):. NEE measures the energy-conserving ability of our protocols. The basic idea behind NEE is simple { it is simply the Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 41 energy spent on average by a client to download a bucket of data. Consider a client i, who spent Wi Watt-sec of energy during her stay in the cell and accessed (downloaded) her data cluster ci times. Further assume that this client's data cluster of interest (DCI) contained bi buckets. Then her normalized energy expenditure, NEEi , would be given by ciWibi . We are really interested in the overall NEE of the cell. Assume that during the course of the simulation we tracked a total of L clients. Then, to compute the overall NEE, we use the following formula: PL NEE = PL i=1 Wi i=1 (ci bi ) where Wi and ci are the energy expenditure and number of downloads respectively of client i. Percentage of Clients Starved (PCS):. PCS is a measure of the quality of service provided by our protocols. Recall that one of the main goals in designing our protocols was to optimize service provided to clients interested in hot items while ensuring that all clients, even those interested in cold items, receive their DCIs at least once during their stays in the cell. We dene a starved client as one who departs from the cell before reading his DCI. PCS, then, can be dened as follows: PCS = NumClientsStarved NumClients Now we report the results of our experiments. We rst perform a baseline experiment with some \typical" parameter values. Subsequently we study the sensitivity of the results to the perturbation of certain \key" parameters. Finally, we consider the quality of service provided by our protocols. All the curves presented in this paper exhibit mean values that have relative half-widths about the mean of less than 10% at the 90% condence level. Each simulation experiment was run until at least 50000 clients departed the cell. We only discuss statistically signicant dierences in the ensuing performance reporting section. 10.2 Baseline Experiments As mentioned before, we investigate three protocols for client retrieval: the Constant Broadcast Size (CBS) strategy, the Variable Broadcast Size (VBS) strategy, and the Imielinski, Vishwanathan, and Badrinath (IVB) strategy. First, we present the baseline parameter values and some justication of some the values chosen. Table 5 shows the selected baseline parameter values. We note here that some parameters are only valid for certain protocols. For example, the parameter BroadcastSize, which represents the length of the broadcast in the periodic protocols CBS and IVB, has no meaning in the VBS protocol, in which the broadcast size grows and shrinks with client demand. Other such examples should be obvious. We add a few notes to justify some of the values chosen for our parameters. |DatabaseSize: We chose 3000 items based on the number of companies listed on the NYSE exchange (currently 2,920 companies) [NYSE 1997]. 42 A. Datta, D. VanderMeer, A. Celik, V. Kumar Notation DatabaseSize MeanRecordsPerItem StdRecordsPerItem MinRecordsPerItem MaxRecordsPerItem DesiredWaitCount PercentHot PercentChooseHot DataRecordSize IndexRecordSize DataHeaderSize IndexHeaderSize BucketSize IndexSegmentsPerBroadcast BroadcastSize BucketsPerSecond MeanChangeFrac StdChangeFrac MinChangeFrac MaxChangeFrac TransmissionRate ArrivRate MeanResidenceLatency StdResidenceLatency IdlePeriod SpinUp Baseline Value 3000 items 150 records 6 records 3 records 300 records 3 broadcasts 20% 80% 40 bytes 5 bytes 8 bytes 8 bytes 128 bytes 4 1600 seconds 18.87 seconds 0.02 0.001 0 1 19.2 kbps variable 8000 seconds 26.67 seconds 30 seconds 2 seconds Table 5. Broadcast and Cell Parameters used in the Simulation |TransmissionRate: According to Motorola product specications, 19.2 kbps is currently the highest data rate available for wireless modems [Inc. 1997]. |IdlePeriod, SpinUp: These disk parameters were chosen based on product specications for Seagate disk drives for laptops [Seagate Technology 1997]. Before we present the baseline results we would like to acquaint the reader with the general outline of the ensuing discussion. We rst discuss the normalized energy expenditure (NEE) metric and present the energy consumption characteristics of the dierent protocols. In discussing the NEE metric, we rst compare the CBS and IVB protocols and then compare the CBS and VBS protocols. Why did we make this separation, rather than discuss all three together? As discussed before, the dierence between CBS and IVB is the selective tuning on the part of the CBS clients. However, the server side protocols are the same. Conversely the dierence between the CBS and VBS protocols is in the server side strategies { the client side protocols are the same. In other words, the comparison between CBS and IVB highlights the eciency of our client protocols, while the comparison between CBS and VBS illuminates the dierences between our server end algorithms. In order to showcase this dichotomy, we have decided to separate the presentation of the results. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 43 Normalized Energy Expenditure (power consumption/bucket cached) Having discussed the NEE features, we turn our attention to the access time (AT) curves. In discussing the AT curves we follow a format of presentation similar to that of the NEE curves, i.e., we separate the presentation of the CBS vs. IVB protocol from the discussion of the CBS vs. VBS protocols. 84 CBS Base hot IVB Base hot 82 80 78 76 74 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (power consumption/bucket cached) [A] 260 240 CBS Base cold IVB Base cold 220 200 180 160 140 120 100 80 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 8. Experimental Results: [A] NEE curves for CBS and IVB hot clients; [B] NEE curves for CBS and IVB cold clients 10.2.1 Comparing Normalized Energy Expenditure for CBS and IVB. We rst present how the CBS and IVB protocols fare with respect to energy consumption. Recall that in our model, a client has a specic data item that he is interested in, and that this data item may be a hot item or a cold item. Our intuition was that the type of data item that a client was interested in would markedly impact the client's performance. This intuition was borne out by the experimental results. Accordingly, we present the NEE results of hot and cold clients in gures 8A and B respectively. 44 A. Datta, D. VanderMeer, A. Celik, V. Kumar NEE hot curves for CBS and IVB. We rst discuss the hot NEE curves shown in gure 8A. Notice that both the CBS and IVB curves have the same basic shape { (a) a short period of constant NEE (below an arrival rate of :075 clients/second), followed by (b) a gently rising NEE curve (between arrival rates of :075 and 0:2) and, nally, (c) a sharply rising trend in NEE values (beyond arrival rates of 0:2 clients/sec). We now explain this shape. |At low arrival rates, the demand is not high enough to completely ll the broadcast, thus, every requested item can t into the broadcast at these rates. Moreover, dead air is inserted into whatever unlled broadcast period may remain after insertion of all requested data items. Thus at these rates, all clients have exactly similar access and wait patterns, explaining the at nature of the NEE curve at these loads. |At a certain load, the broadcast becomes full (this occurred at approximately the :075 clients/sec load in our experiments). After this point there is contention among the requested items for inclusion. Our broadcast period was chosen such that about 95% of the hot items in the database can t into the broadcast simultaneously. Thus there is a very small \bumping" of hot items by other hot items at these rates. More importantly, at these loads, cold items get seriously bumped { thus Ignore Factor (IF) comes into play (Recall our priority computation scheme described in section 5). Because of this factor, occasionally hot items do get bumped by cold items, but this does not happen very often. The net eect is that at these loads, hot clients are likely, but no longer guaranteed, to nd their items in every broadcast. This has an eect of a mild increase in waiting times for items. Even while waiting however, a client spends energy (see the sleep mode power consumption rates in table 4). As load increases, potentially fewer and fewer buckets get downloaded as clients are increasingly likely to nd their items dropped from broadcasts, for reasons outlined above. Thus, energy consumed per bucket increases, explaining the gently rising trend of NEE for these loads. |As mentioned in the previous items, with increasing load, more and more clients request cold (and hot) items. Hot items are included much more often in the broadcast than cold items (though some cold items occasionally make it into the broadcast). Thus, as load increases, requests for cold items are denied for longer and longer periods of times. Eventually, the load reaches a point where there are signicant requests for cold items which are denied for periods greater than the Quality of Service measure of the system termed DesiredWaitCount (DWC) (explained previously in section 6). When this occurs, our priority computation procedure starts awarding high priorities to these cold items, as the Adaptive Scaling Factor values increase, which then start bumping a large number of hot items from the broadcast. This causes hot clients to wait much longer to download their items, causing a large increase in NEE. Moreover, this phenomenon is exacerbated as load levels increase even further, explaining the sharply rising trend of the NEE curve. Having explained the overall shape of the NEE hot curves, we compare the CBS NEE hot curve to the IVB NEE hot curve. The CBS protocol outperforms the IVB protocol at all load levels. This represents the energy savings obtained from Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 45 the use of the smart cooperative tuning protocols of the client and the server. In other words, the selective tuning policies we advocate do make sense and the clients retrieving hot items from the database save approximately 1.6 Joules per bucket downloaded. This is non-trivial. For instance, consider that most high-end laptops are powered by Lithium Ion batteries. A Lithium Ion cell outputs 3.7 Volts for 1.250 Ampere-hours [Corporation 1997]. This translates to 16650 Joules of stored energy of one Lithium Ion cell. Now, let us make the conservative assumption that the four components included in our cost model account for 80% of the total energy consumption of a mobile unit, approximately 13320 Joules of stored energy. This translates to a savings of approximately 400 Joules (approximately 2.5% of the capacity of the Lithium Ion cell) over the course of a CBS client's residence in the cell. This savings allows CBS clients, on average, to monitor the broadcast, i.e., maintain power, through 2100 more buckets of broadcast than the average IVB client. This translates into 268800 additional bytes of data. NEE cold curves for CBS and IVB. We move on to describe the CBS and IVB NEE cold curves, shown in gure 8B. As with the hot curves, we begin by describing the general trend of the curves. At low loads, at which client demand does not ll the broadcast, NEE cold is exactly similar to NEE hot. This constant trend, in our experiments, is visible at loads less than 0:075 clients/sec. As load increases, the broadcast lls up, and hot items begin to bump cold items from the broadcast, causing an increase in the slope of NEE as some clients requesting cold items are forced to wait more then one broadcast period for their DCIs to (re)appear. This eect gets exacerbated with increasing load, explaining the rapidly increasing trend exhibited by the NEE curves between loads of :075 and 0:2. As load increases further, clients who are interested in cold items wait for periods longer than the DWC designated by the system (this is explained in detail in the hot NEE discussion above), and cold items are awarded high priorities. This forces their inclusion into the broadcast at the expense of hot items. In the hot NEE curves, this caused a sharp increase in the NEE values. For exactly the same reason, this causes a decrease in the slope of the cold NEE curve as the availability of cold items increases. We now compare the CBS and IVB cold NEE curves to one another. At low loads, there is a small (barely perceptible) dierence between CBS NEE cold and IVB NEE cold, where CBS outperforms IVB. At this level of client demand, the hot and cold items requested by clients t into the xed period of the broadcast. CBS cold clients are able to read from consecutive broadcasts, realizing some energy savings from the smart tuning protocols. Once hot items begin to bump cold items from the broadcast, however, very few clients requesting cold items are able to read their DCIs from consecutive broadcasts. As remarked earlier, unless clients can retrieve from successive broadcasts, our method assumes that all data has changed and forces the download of the entire data cluster { which is exactly the same as what the IVB protocol does. Thus, for higher loads, NEE cold for CBS and IVB are equal. Summary. To summarize, it may be said that the smart tuning aspects of our protocols make sense if there is some locality of data access patterns, i.e., a \hot"class 46 A. Datta, D. VanderMeer, A. Celik, V. Kumar Normalized Energy Expenditure (power consumption/bucket cached) of data items exists. If such access patterns do not exist, selective tuning does not have an appreciable impact. 120 CBS Base hot VBS Base hot 110 100 90 80 70 60 50 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (power consumption/bucket cached) [A] 250 CBS Base cold VBS Base cold 200 150 100 50 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 9. Experimental Results: [A] NEE curves for CBS and VBS hot clients; [B] NEE curves for CBS and VBS cold clients 10.2.2 Comparing Normalized Energy Expenditure for CBS and VBS. Having discussed the dierence between the CBS and IVB protocols, we turn our attention to highlighting the dierence between the CBS and VBS protocols. Again we separate the discussion of the hot and cold clients. Figures 9A and B present the NEE curves for hot and cold clients respectively. NEE hot curves for CBS and VBS. Let us rst consider the hot curves shown in gure 9A. The rst item of interest is the fact that the CBS and VBS curves have very dierent shapes. This leads us to believe that the server side protocols determine the shape of the curves (note that the CBS and IVB curves, with identical server side protocols, have the same shapes). Let us, therefore, attempt to explain the shapes of the CBS and VBS hot NEE curves. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 47 The shape of the CBS NEE hot curve has been described in the CBS vs. IVB section above (though it looks dierent because the gures 8A and 9A are scaled dierently) and we concentrate on the VBS curve. The VBS NEE hot curve is characterized by two features: (a) as load increases NEE increases, and (b) the increase is higher at lower loads, i.e., the slope of the NEE curve decreases with increasing load. Both of these features are easily explained: |It is easily seen that NEE is closely related to broadcast size { as the broadcast size increases, clients have to wait longer download their items. However, as mentioned before, even while in sleep mode, clients continue to expend energy. As broadcast size increases, for the same amount of residence, fewer data clusters are downloaded (as waiting time increases). Thus, NEE continues to increase. |Now we explain the decreasing slope of the NEE curve. At low loads, any new request is likely to be a request for an item not already in the broadcast. Recall that VBS will include every requested item in the broadcast. This signies that at low loads, virtually every new request will result in an increase in broadcast size. This explains the (comparatively) higher slope at low loads. As load increases, it becomes more likely that clients will request items that are already in the broadcast. Thus, at these loads, not every request results in an increase in broadcast size. This results in the decreasing slope of the NEE curve with increasing load. In fact, (though this is not shown in the gure) if the load is made high enough, the VBS NEE curve virtually attens out. Having explained the shape of the curves, we now compare the NEE hot curves for VBS and CBS. At low loads, VBS outperforms CBS, where client demand is not high enough to ll the CBS broadcast size. At these loads, the CBS broadcast lls the remainder of the broadcast with dead air, while the VBS broadcast adapts its size to the lower client demand. Thus, at low loads, VBS provides data to clients at lower energy expenditure per download than CBS. At the load at which clients request just enough items to ll the CBS broadcast, VBS and CBS perform identically. The VBS NEE hot curve crosses the CBS NEE hot curve at this load, since the size and content of the VBS broadcast is the same as that of the CBS broadcast. This occurred, in our experiments at approximately :075 clients/sec as shown in gure 9A. At higher loads, CBS NEE hot outperforms VBS NEE hot. Here, the CBS protocol responds to the larger number of client requests by prioritizing items for inclusion in the broadcast, and ignoring some items, causing relatively stable NEE hot values. The VBS protocol, however, expands to include all requested items, causing higher NEE values. NEE cold curves for CBS and VBS. We now describe the NEE cold curves for CBS and VBS, shown in gure 9B. As usual, we rst consider the shapes. The CBS NEE cold curve is the same as that shown in gure 8B and described earlier. The VBS NEE cold curve is exactly the same as the VBS NEE hot curve, as the VBS protocol does not distinguish between hot and cold items { everything is included in the broadcast. We now compare the performance of VBS and CBS for energy consumption for clients requesting cold items. These curves, shown in gure 9B, are particularly 48 A. Datta, D. VanderMeer, A. Celik, V. Kumar interesting as VBS NEE cold outperforms CBS NEE cold at virtually all loads. At low loads, the VBS protocol adapts to low client demand, while the CBS protocol transmits dead air if not enough requests are received to ll the broadcast. Eventually there comes the (by now well known) point where the broadcast sizes are identical. At this this point, the VBS and CBS curves meet. From this point to a load level of 0.04 clients/sec, CBS and VBS perform virtually identically, as the VBS broadcast size grows and the CBS broadcast begins bumping some items. This situation however, does not last. With increasing load, more and more cold items get bumped from the CBS broadcast and client demands for cold items are increasingly left unsatised. In the VBS broadcast on the other hand, while the broadcast size increases, clients are guaranteed to nd their items in every broadcast. Eventually there comes a point where the cold clients are starved longer in the CBS protocol than they would have to wait through the longer broadcast in the VBS protocol. Beyond this load point, shown as :11 clients/sec in our experiments, VBS exhibited a lower NEE than CBS. Summary. In summary, for hot clients, CBS outperforms VBS at all but very low loads, i.e., where the CBS broadcast is not full. Here, the VBS dominates. For cold clients, for all load levels, the VBS protocol either outperforms or performs identically to the CBS protocol. This makes sense, since the CBS protocol is optimized to provide better service to clients interested in more popular items at the expense of service for cold items. 10.2.3 Comparison of Access Times. Having presented the NEE results for the baseline case, we now turn our attention to analyzing the other metric, namely Access Time (AT). In presenting AT results, we follow an identical organization to the one adopted to present the NEE results. But rst we make an important observation that applies to all the AT results presented here. It turns out that the AT curves have a similar shape to the NEE curves and the relative positions of these curves in relation to each other is the same as those in the NEE case. In other words, the AT results exhibit trends very similar to the NEE results. In most of the AT results presented in the literature (e.g., [Imielinski et al. 1997]), AT and tuning time (TT) show quite dierent forms, and TT has been the measure that has been traditionally used to demonstrate energy eciency. In this paper however, instead of using TT as a measure of energy eciency, we have used actual energy consumption cost model, which, needless to say, adds signicant accuracy to our model. Moreover, it goes to show that TT is not an accurate predictor of energy consumption. We provide rationale for the above observations below. First of all, let us explore why the AT and NEE curves exhibit similar trends. Clearly, AT depends upon how long clients have to wait to access their data items { since AT is in units of time, an additional unit of time added to the waiting period of a client adds an additional unit of time to its AT. The computation of NEE however, is not so simple. Simply put, clients spend energy in two ways: (a) they spend more energy in the active mode, and (b) they spend a constant amount of low energy when in the wait, or sleep mode (i.e., when all four components are asleep). It is easily seen then, that if a client were perpetually asleep (or waiting), its energy consumption would be directly proportional to time (i.e., of the form k t where k represents constant energy consumption per unit time in the sleep mode). In other Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 49 words, during the time when a client is asleep its NEE curves will have exactly the same shape as AT curves (but scaled by the k factor, of course). The issue however, is that a client is not always asleep { rather, its wakes up occasionally to download data items. Moreover, dierent components wake up at dierent times, e.g., the MDC wakes up to transmit a request when a client arrives at a cell, or to download from a broadcast; the disk wakes up to write data to the cache (the disk however may be asleep when the MDC is transmitting), and so on. In other words, while the system is awake (which occurs when any one or more of the four components are awake) there is variable power dissipature, i.e., non-uniform energy consumption. Clearly, therefore, if the system were awake continuously, we would observe a NEE curve quite dierent from the AT curve. Why then do we see such a similarity between the AT and NEE curves? The reason is because the system is sleeping (i.e., all components are asleep) most of the time. In our experiments, based on the parameter values chosen, clients, on average were sleeping through 90% of their residence in the cell (this of course ignores any internal computation that a client engages in { this has no bearing on our problem). This also makes perfect intuitive sense. After all, a typical client is often interested in a very small subset of all the information being broadcast. Therefore, if smart strategies were utilized (as we have demonstrated) the client could be kept in sleep mode most of the time. Thus, the sleep time power consumption dynamics greatly dominate the overall energy expenditure gures. As discussed above, when a client is sleeping its NEE curve is simply a scaled up version of the AT curve. This explains why we see such marked similarities of form between the two curves. This discussion also explains why tuning time (TT) is not an accurate measure of energy expenditure. Clearly, since the \tuned-out" times greatly dominate \tunedin" times, TT, used by itself, mis-characterizes the nature of power dissipature in mobile clients. Next, we actually show the AT curves that we generated, using the same organization as the NEE graphs. Important note: We would like to draw the reader's attention to the fact that the shape of the VBS curve is identical to the shape of the VBS AT curve in our analytical study, shown in gure 5B. We note that the shape of our CBS AT curve, however, does not match that of our analytical study, also shown in gure 5B. Our analytical CBS curve is at, but our empirical CBS curve has a rising slope. This is easily explained. In our analytical study, we disregarded the notion of Ignore Factor in order to make our study mathematically tractable, while our empirical study considers the eects of Ignore Factor. This accounts for the rising slope of the CBS curve produced in our simulation study. Access Times for CBS and IVB Protocols. The access times for the CBS and IVB protocols corresponding to hot and cold clients are shown in gures 10A and B respectively. The shapes are the same as the corresponding NEE curves (gures 8A and B) for reasons explained above. The noteworthy feature of these gures is that the CBS and IVB curves are identical. However, this is expected as the basic CBS and IVB server protocols are identical. The only dierence is smart tuning, which has no eect on access times because it is part of the client-side protocol. Access Times for CBS and VBS Protocols. The access times for the CBS and VBS protocols corresponding to hot and cold clients are shown in gures 11A and B 50 A. Datta, D. VanderMeer, A. Celik, V. Kumar 1680 CBS Base hot IVB Base hot Access Time (seconds) 1660 1640 1620 1600 1580 1560 1540 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 [A] 5000 CBS Base cold IVB Base cold Access Time (seconds) 4500 4000 3500 3000 2500 2000 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 10. Experimental Results: [A] AT curves for CBS and IVB hot clients; [B] AT curves for CBS and IVB cold clients respectively. The shapes are the same as the corresponding NEE curves (gures 9A and B) for reasons explained above. For reasons of brevity we do not discuss these curves any further. 10.3 Eect of varying the size of the broadcast Having discussed our baseline results, we now turn our attention to presenting the results of some sensitivity analyses. We rst discuss the eects of changing broadcast size. Clearly, this discussion is only valid for the CBS protocol, as the VBS strategy does not have a xed broadcast size. Size is an important parameter as it impacts, simultaneously, two important features of the broadcast: (a) wait times of clients { the longer the broadcast, the longer, on average, a client must wait, even if his item of interest is included in every broadcast, and (b) inclusion contention for items { the shorter the broadcast, the greater the number of items that must be left out. From this point on in this paper, we no longer report AT results. This saves space, Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 51 2400 CBS Base hot VBS Base hot Access Time (seconds) 2200 2000 1800 1600 1400 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 [A] 5000 CBS Base cold VBS Base cold 4500 Access Time (seconds) 4000 3500 3000 2500 2000 1500 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 11. Experimental Results: [A] AT curves for CBS and VBS hot clients; [B] AT curves for CBS and IVB cold clients and does not take away from quality of presentation as the AT curves are quite similar to the NEE curves. The NEE results for these experiments corresponding to hot and cold clients are presented in gures 12A and B respectively. NEE hot curves for varying the size of the broadcast. First, we discuss the NEE hot curves for varying broadcast sizes, shown in gure 12A. We note that the curve for the broadcast size of 1590 seconds is the same as the CBS NEE hot baseline curve discussed earlier in connection with gure 8A. We further remind the reader that the size of the baseline broadcast was chosen such that it accommodates virtually all of the hot items in the database. This will lead to an important conclusion in the end. A larger broadcast size of 2120 seconds produces a much atter curve for NEE hot than the baseline curve. Since the broadcast size is larger, more items can be included in it. This leads to less contention for broadcast space, and fewer ignored items per broadcast. Thus, fewer clients are starved through multiple broadcasts A. Datta, D. VanderMeer, A. Celik, V. Kumar Normalized Energy Expenditure (power consumption/bucket cached) 52 140 130 CBS hot, Broadcast Size = 1060 CBS hot, Broadcast Size = 1590 CBS hot, Broadcast Size = 2120 120 110 100 90 80 70 60 0.05 0.1 0.15 Arrival Rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (power consumption/bucket cached) [A] 350 CBS cold, Broadcast Size = 1060 CBS cold, Broadcast Size = 1590 CBS cold, Broadcast Size = 2120 300 250 200 150 100 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 12. Experimental Results: [A] NEE curves for CBS hot clients for varying BroadcastSize values; [B] NEE curves for CBS cold clients for varying BroadcastSize values in order to download their DCIs, leading to a lower slope then the baseline curve. However, clients are forced to wait longer because of the increased broadcast size. This results in increased power consumption and, consequently, increased energy expenditure. Thus the curve for 2120 seconds is over the curve for 1590 seconds, i.e., the overall NEE hot values for the larger broadcast size are higher than those of the lower-sized broadcast. Important Note: It turns out that at an arrival rate of 1.5 clients/sec the 1590-second curve crosses, and then overtakes, the 2120-second curve. This is not shown in the gure, as average arrival rates this high are unlikely in real applications. We experimented with a smaller broadcast size of 1060 seconds { this is shown in gure 12B. A smaller broadcast size of 1060 seconds can accommodate fewer items than the baseline. Items are therefore bumped more frequently, causing more clients to wait through multiple broadcasts for their DCIs than in the baseline case. Because items are bumped more frequently, the slope of the NEE curve for the smaller broadcast size is greater than that of the baseline case. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 53 Next we compare the NEE hot values for the 1060-second curve and the baseline curve. At low arrival rates, few clients nd their items ignored in either broadcast size. Therefore the lower sized broadcast shows better NEE than the baseline case (for a similar reason the baseline VBS hot NEE curve was below the CBS baseline hot NEE curve { see discussion pertaining to gure 9A). However, with increasing load, the lower sized broadcast faces high item inclusion contention, resulting in a steep increase in NEE (shown around 0:16 clients/sec in our gure). Finally, the baseline curve shows better overall NEE values starting at 0:18 clients/sec. NEE cold curves for varying the size of the broadcast. The cold curves, shown in gure 12B, are as expected { the 2120-second curve has a smaller slope and the 1060-second curve has a higher slope than the baseline (1590-second) curve. The only dierence with the hot curves is that the crossover points dier. The lowest size curve rapidly exhibits the worst NEE, which is expected as the cold items are rapidly bumped by the hot items owing to the small size of the broadcast. Moreover the baseline curve exhibits higher NEE gures than the highest sized curve at moderate arrival rates. This is also expected as the largest broadcast allows more cold items to remain in the broadcast, resulting in reduced client starvation. Summary. In summary, the following conclusions can be drawn from this experiment. If access locality is not present, a small broadcast size is strictly non-optimal. However, in the more realistic case where access locality is indeed present, the two important determinants of broadcast size are (a) the size required to accommodate the hot items in the database, and (b) expected system load. If system load is expected to be low, a good rule of thumb to follow is to choose a broadcast size smaller than that required to accommodate all the hot items. In our experiments we found that a broadcast that can accommodate approximately 80% of all hot items works fairly well for low load systems. However if system load is expected to be moderate to high, it is near-optimal to choose a size that can accommodate all the hot clusters. One important (and somewhat counter-intuitive) lesson is that larger sizes have very little marginal benet. The smaller inclusion contentions realized as a result of larger sizes are more than outweighed by the longer wait times. 10.4 Eect of Varying Client Mobility Next we study the eects of varying client mobility, i.e., we study what happens if clients, on average, stay for a longer or shorter duration of time in cells. This is achieved by varying the Residence Latency (RL) characteristic of our model. We study both the CBS and VBS algorithms in isolation and then provide a comparative discussion. 10.4.1 CBS NEE curves for varying client mobility NEE hot curves by varying the MeanResidenceLatency parameter. First, we describe the NEE hot curves for varying the average client's length of residence in the cell, shown in gure 13A. We note that the curve for NEE hot for a MeanResidenceLatency value of 7950 seconds is the same as the baseline CBS NEE hot curve, discussed in connection with gure 8A. We rst make the observation that changing RL has the eect of altering the A. Datta, D. VanderMeer, A. Celik, V. Kumar Normalized Energy Expenditure (energy consumption/bucket cached) 54 105 CBS hot, RL = 5300 CBS hot, RL = 7950 CBS hot, RL = 10600 100 95 90 85 80 75 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (energy consumption/bucket cached) [A] 300 CBS cold, RL = 5300 CBS cold, RL = 7950 CBS cold, RL = 10600 250 200 150 100 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 13. Experimental Results: [A] NEE curves for CBS hot clients for varying MeanResidenceLatency values; [B] NEE curves for CBS cold clients for varying MeanResidenceLatency values client load on the system. Clearly, given a specic arrival rate, if RL is increased, the client density in the cell increases as clients stay longer. Similarly, decreasing RL has the eect of reducing system load. For a longer average stay in the cell of 10600 seconds, the NEE curve has a larger slope than the baseline curve. Here, the load on the system is greater than that of the baseline case. Clients remain in the cell for a longer period of time, increasing the number of items requested. This increases contention for space in the broadcast. More clients are forced to wait through multiple broadcast periods to retrieve their DCIs, leading to higher average NEE hot values than the baseline NEE hot values. For a shorter average stay in the cell of 5300 seconds, the NEE hot curve has a smaller slope than the baseline curve. A smaller number of clients in the produces a smaller number of requested items, less contention for space in the broadcast, and shorter overall waits for clients to retrieve their DCIs. This leads to lower overall NEE hot values and a atter curve than in the baseline case. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 55 Normalized Energy Expenditure (energy consumption/bucket cached) NEE cold curves for varying the MeanResidenceLatency parameter. We now describe the NEE cold curves, shown in gure 13B. We note that the curve for NEE cold for a RL value of 7950 seconds is the same as the baseline NEE cold CBS curve, discussed in connection with gure 8B. A larger RL of 10600 seconds produces a curve similar in shape to the baseline curve, but with a larger slope. Like the hot NEE curve for the larger broadcast size, the longer client stay in the cell leads to a larger number of clients in the cell. This, in turn, leads to a larger number of requested items and more contention for space in the broadcast. Clients must, on average, wait longer for their DCIs, causing both larger absolute NEE cold values and a larger slope than the baseline case. A smaller RL of 5300 seconds leads to a smaller number of clients in the cell, and less contention for space in the broadcast. Fewer items are excluded from the broadcast than in the baseline case, so fewer clients wait through multiple broadcast periods for their DCIs. This leads to lower overall NEE cold values and a smaller slope to the NEE cold curve than in the baseline case. 140 VBS, RL = 5300 VBS, RL = 7950 VBS, RL = 10600 120 100 80 60 40 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 Fig. 14. Experimental Results: NEE curves for VBS clients for varying MeanResidenceLatency values 10.4.2 VBS NEE curves for varying client mobility NEE curves for varying the MeanResidenceLatency parameter. First, we note that the VBS protocol does not distinguish between hot and cold items, i.e., all items are included in the broadcast. This results in cold curves that are exactly similar to the hot curves. Therefore, we do not separate the presentation of hot and cold VBS NEE curves. Next, we describe the NEE curves for varying the length of clients' residence in the cell for the VBS protocol, shown in gure 14. The curve for a RL of 7950 seconds is the same as the baseline NEE VBS curve, discussed in connection with gure 9A. A larger RL value of 10600 seconds produces a curve similar in shape and slope to the baseline curve, but with a higher absolute NEE value. Longer client stays 56 A. Datta, D. VanderMeer, A. Celik, V. Kumar lead to a larger number of clients, and therefore more requested items. Since the size of the VBS broadcast increases to adapt to a larger demand, the larger RL value leads to higher NEE values than in the baseline case. A smaller RL value of 5300 seconds also produces a curve similar in shape and slope to the baseline curve, but with lower absolute NEE values. Shorter client stays lead to a smaller number of clients and requested items. The VBS broadcast shrinks to accommodate the lower load on the system, which leads to shorter waits to download data. This leads to lower NEE values than in the baseline case. A Comparative Study of CBS and VBS for varying client mobility. We rst consider the curves for CBS and VBS NEE hot, shown in gure 15A. As in the baseline case, VBS hot outperforms CBS hot at low load levels and CBS hot outperforms VBS hot at higher load levels for all variations of RL. The main dierence is the point of crossover. For a higher client density, i.e., higher MeanResidenceLatency than baseline, the crossover point occurs at a lower load level. For a lower client density, the VBS curve crosses the CBS curve at a higher load than in the baseline case. The NEE cold curves in gure 15B are similar to the baseline curves for all variations of RL in that VBS performs equally as well as or better than CBS. Summary. In summary, our experiments with varying the MeanResidenceLatency parameter show that performance degrades as the average length of a client's residence, and therefore client density in the cell, increases. If access locality is present in the data set, VBS outperforms CBS at low loads, while CBS predominates at high loads. If there is no access locality, VBS outperforms CBS at all load levels. Finally, we add a note that RL is not a tunable parameter, but is instead based on the geographic and demographic properties of the cell. 10.5 Eect of varying the number of items in the database We now turn our attention to the eects of changing the number of items in the database. This is achieved by varying the DatabaseSize parameter in our model. We note that changing the number of items in the database changes the inclusion contention in the broadcast. A larger database increases the size of the hot set of items, which in turn increases contention for inclusion in the broadcast. Fewer items in the database, similarly, decreases contention for inclusion in the broadcast. 10.5.1 CBS curves for varying the number of items in the database NEE hot for varying the number of items in the database. First, we describe the CBS NEE hot curves for varying database sizes, shown in gure 16A. We note that the curve for a database size of 3000 items is the same as the baseline CBS NEE hot curve, discussed earlier in connection with gure 8A. A larger database size of 4500 items produces a curve with a much larger slope than that of the baseline curve. In this scenario, clients have a wider choice of data items, causing an increase in the overall number of requested items. This causes increased contention for space in the broadcast and longer waits for data items as some items get bumped from the broadcast. This leads to larger NEE values at all load levels than in the baseline case. The slope of the curve for a smaller database size of 1500 items is much atter Normalized Energy Expenditure (energy consumption/bucket cached) Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 140 120 57 CBS hot, RL = 5300 CBS hot, RL = 7950 CBS hot, RL = 10600 VBS hot, RL = 5300 VBS hot, RL = 7950 VBS hot, RL = 10600 100 80 60 40 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (energy consumption/bucket cached) [A] 300 250 CBS cold, RL = 5300 CBS cold, RL = 7950 CBS cold, RL = 10600 VBS cold, RL = 5300 VBS cold, RL = 7950 VBS cold, RL = 10600 200 150 100 50 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 15. Experimental Results: [A] NEE curves for CBS and VBS hot clients for varying MeanResidenceLatency values; [B] NEE curves for CBS and VBS cold clients for varying MeanResidenceLatency values than the baseline curve. The smaller choice of items leads to a smaller number of requested items and less contention for broadcast space. Fewer clients must wait through multiple broadcasts to retrieve their DCIs, which is manifested in lower NEE values than in the baseline case. NEE cold for varying the number of items in the database. We now discuss the CBS NEE cold curves for varying the size of the database, shown in gure 16B. We note that the curve for a database size of 3000 items is the same as the baseline CBS NEE cold curve, shown in gure 8A. The curve for a larger database size of 4500 items has a shape similar to that of the baseline curve, but a larger slope. The larger number of items vying for inclusion in the broadcast forces some items to be bumped, i.e., some clients are forced to wait for longer than one broadcast period to retrieve their items. This leads to higher NEE cold values than the baseline case. A. Datta, D. VanderMeer, A. Celik, V. Kumar Normalized Energy Expenditure (power consumption/bucket cached) 58 105 CBS hot, Database Size = 1500 CBS hot, Database Size = 3000 CBS hot, Database Size = 4500 100 95 90 85 80 75 0.05 0.1 0.15 Arrival Rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (power consumption/bucket cached) [A] CBS cold, Database Size = 1500 CBS cold, Database Size = 3000 CBS cold, Database Size = 4500 250 200 150 100 0.05 0.1 0.15 Arrival Rate (clients/second) [B] Fig. 16. Experimental Results: [A] NEE curves for CBS hot clients for varying DatabaseSize values; [B] NEE curves for CBS cold clients for varying DatabaseSize values A smaller broadcast size of 1500 items has a similar shape to that of the baseline curve, but a much atter curve. A narrower choice of items leads to a smaller list of requested items, and less contention for broadcast space. This, in turn, results in fewer clients having to wait longer than one broadcast period to retrieve their items, which produces lower average NEE cold values than the baseline case. 10.5.2 VBS curves for varying the number of items in the database NEE for varying the number of items in the database. First, we reind the reader that VBS does not distinguish between hot and cold items when determining broadcast content, leading to cold curves that are exactly similar to the hot curves. For this reason, we do not separate the presentation of hot and cold VBS NEE curves. Next, we describe the VBS NEE curves for varying database sizes, shown in gure 17. We note that the curve for a database size of 3000 items is the same as the VBS NEE baseline curve, shown in gure 9A. The curve for a larger broadcast of 4500 items has a shape similar to that of the Normalized Energy Expenditure (power consumption/bucket cached) Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 140 59 VBS, Database Size = 1500 VBS, Database Size = 3000 VBS, Database Size = 4500 120 100 80 60 0.05 0.1 0.15 Arrival Rate (clients/second) 0.2 0.25 Fig. 17. Experimental Results: NEE curves for VBS clients for varying DatabaseSize values baseline curve, but a larger slope and higher NEE values. The larger choice of items leads to a larger list of requested items for inclusion in the broadcast. Since the VBS broadcast size adapts to the larger number of requested items, clients must wait longer to retrieve their DCIs. Longer waits lead to higher NEE values than in the baseline case. The smaller database size of 1500 items produces a curve with the same general shape as the baseline curve, but with a smaller slope and lower NEE values. The narrower choice of items leads to smaller numbers of requested items, and smaller broadcast sizes. Client waits for DCIs are shorter than in the baseline case, which is manifested in lower NEE values then in the baseline case. A Comparative Study of CBS and VBS. We rst compare the CBS and VBS NEE hot curves, shown in gure 18A. We note that the trends are similar to those of the baseline case, where VBS performs better than CBS at low loads, and CBS outperforms VBS at higher loads. In addition, we see that larger database sizes, due to higher contention for broadcast space, show the VBS curve crossing the CBS curve at lower load than in the baseline case, while the curve for a smaller database size crosses the CBS curve at a higher load than baseline. We also note that the dierence between the CBS and VBS NEE values is larger than the dierence between baseline values for larger database sizes, and smaller for smaller database sizes. We now consider the CBS and VBS NEE cold curves shown in gure 18B. As in the baseline case, the VBS protocol performs equally as well or better than CBS for all variations of database size. We also note that the dierence between CBS and VBS NEE values is larger than the dierence in the baseline case for larger database sizes, and smaller for smaller database sizes. Summary. Our experiments show that our protocols perform better for smaller database sizes than for larger databases. Intuitively, this makes sense, since a larger list of items to be broadcast will either cause contention for broadcast space (in the CBS protocol) or a longer broadcast (in the VBS protocol). If access locality A. Datta, D. VanderMeer, A. Celik, V. Kumar Normalized Energy Expenditure (power consumption/bucket cached) 60 140 CBS hot, Database Size = 1500 CBS hot, Database Size = 3000 CBS hot, Database Size = 4500 VBS hot, Database Size = 1500 VBS hot, Database Size = 3000 VBS hot, Database Size = 4500 120 100 80 60 0.05 0.1 0.15 Arrival Rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (power consumption/bucket cached) [A] CBS cold, Database Size = 1500 CBS cold, Database Size = 3000 CBS cold, Database Size = 4500 VBS cold, Database Size = 1500 250 VBS cold, Database Size = 3000 VBS cold, Database Size = 4500 200 150 100 50 0.05 0.1 0.15 Arrival Rate (clients/second) [B] Fig. 18. Experimental Results: [A] NEE curves for CBS and VBS hot clients for varying DatabaseSize values [B] NEE curves for CBS and VBS cold clients for varying DatabaseSize values is present in the data set, VBS outperforms CBS at low loads on the system, while CBS outperforms VBS at higher loads. If no access locality is present, VBS outperforms CBS at all load levels. 10.6 Eect of varying the update rate We now consider the eects of varying the update rate, i.e., varying the MeanChangeFrac parameter in our model. We consider the CBS and VBS protocols separately, then provide a comparative discussion. 10.6.1 CBS curves for varying the update rate NEE hot for varying the update rate. First, we describe the CBS NEE hot curves for varying the update fraction, shown in gure 19A. We note that the curve for an update fraction of 0.02 updates per item per second is the same as the the baseline CBS NEE hot curve, shown in gure 8A. A larger update fraction of 0.025 produces a curve with a shape similar to that Normalized Energy Expenditure (energy consumption/bucket cached) Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 61 82 CBS, MeanChangeFrac = 0.01 CBS, MeanChangeFrac = 0.02 CBS, MeanChangeFrac = 0.025 80 78 76 74 72 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (energy consumption/bucket cached) [A] 260 240 220 CBS cold, MeanChangeFrac = 0.01 CBS cold, MeanChangeFrac = 0.02 CBS cold, MeanChangeFrac = 0.025 200 180 160 140 120 100 80 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 19. Experimental Results: [A] NEE curves for CBS hot clients for varying MeanChangeFrac values; [B] NEE curves for CBS cold clients for varying MeanChangeFrac values of the baseline case, but with higher overall NEE values and a smaller slope. More updates to an item leads to more dirty buckets for a given item. Clients reading from consecutive broadcasts must, therefore, read more buckets per broadcast than in the baseline case. As load increases, fewer clients are able to read from consecutive broadcasts, since items are bumped more frequently as load increases. These clients must wait longer than one broadcast period to retrieve their DCIs, and must assume that all buckets are dirty since they did not nd their items in the previous broadcast. A smaller update fraction of 0.01 produces a curve with a shape similar to that of the baseline curve, but with lower overall NEE values and a larger slope. At low loads, many clients are able to retrieve their DCIs from consecutive broadcasts, and are therefore able to achieve the energy savings of using the dirty-read algorithm. As load increases, fewer clients nd their DCIs in consecutive broadcasts, causing the slope of the curve to increase at higher loads. 62 A. Datta, D. VanderMeer, A. Celik, V. Kumar Normalized Energy Expenditure (energy consumption/bucket cached) NEE cold varying the update rate. The CBS NEE cold curves, shown in gure 19B, are very similar to the baseline NEE cold curves. We note that the curve for an update fraction of 0.02 is the same as the baseline CBS NEE cold curve. At low loads, there is some dierence between varying update fractions; a higher update fraction of 0.025 produces a higher NEE, and a lower update fraction produces a lower NEE. As load increases, the curves increasingly converge, since most clients are unable to retrieve their DCIs from consecutive broadcasts and must read the entire DCI when it does appear. 120 VBS, MeanChangeFrac = 0.01 VBS, MeanChangeFrac = 0.02 VBS, MeanChangeFrac = 0.025 110 100 90 80 70 60 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 Fig. 20. Experimental Results: NEE curves for VBS clients for varying MeanChangeFrac values 10.6.2 VBS curves for varying the update rate NEE for varying the update rate. First, we note that we show only one VBS NEE curve here because the VBS protocol does not distinguish between hot and cold items in deciding on broadcast content, producing cold curves that are exactly similar to the hot curves. Therefore, we do not separate the presentation of hot and cold VBS NEE curves here. Next, we describe the VBS NEE curve for varying update rates, shown in gure 20. We note that the curve for an update rate of 0.02 updates per item per second is the same as the VBS NEE baseline curve, shown in gure 9A. A larger update rate of 0.025 updates per item per second produces a curve similar in shape and slope to the baseline curve. At low loads, the curve for the larger update rate is slightly higher than the baseline curve. Here, clients reading dirty buckets only must download more than the same clients in the baseline case because more updates per item have arrived. Higher loads, as explained earlier, increase broadcast size. A larger broadcast size leads to more updates per item during those broadcasts. Thus, few buckets are "clean", and little energy savings can be achieved. Here, the curves for the larger update rate and the baseline almost overlap because clients must read virtually all the buckets of their DCIs during longer broadcast periods. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 63 Normalized Energy Expenditure (energy consumption/bucket cached) A smaller update rate produces a curve similar in shape and slope to the baseline curve. At low loads, the curve is slightly below the baseline curve. Since fewer updates have been made to data items, clients are required to read fewer buckets to keep their caches consistent. At higher loads, however, most of the buckets in the broadcast are "dirty", and clients are forced to read virtually all of the buckets of their DCIs to keep their caches consistent. Thus, at high loads, the curves for the smaller update rate and the baseline are virtually overlapping. A Comparative Study of CBS and VBS. First, we note that varying the update fraction does not aect the server side protocols, i.e., the broadcast structure for all update fractions is the same. The only dierence is the number of buckets a client must read, if she is able to read from consecutive broadcasts. Thus, the relationship between CBS and VBS for both hot and cold NEE is the same as that of the baseline case. These curves are shown in gures 21A and B. 120 110 CBS_hot, MeanChangeFrac = 0.01 CBS_hot, MeanChangeFrac = 0.02 CBS_hot, MeanChangeFrac = 0.025 VBS_hot, MeanchangeFrac = 0.01 VBS_hot, MeanChangeFrac = 0.02 VBS_hot, MeanChangeFrac = 0.025 100 90 80 70 60 0.05 0.1 0.15 Arrival rate (clients/second) 0.2 0.25 0.2 0.25 Normalized Energy Expenditure (energy consumption/bucket cached) [A] 250 200 CBS cold, MeanChangeFrac = 0.01 CBS cold, MeanChangeFrac = 0.02 CBS cold, MeanChangeFrac = 0.025 VBS cold, MeanChangeFrac = 0.01 VBS cold, MeanchangeFrac = 0.02 VBS cold, MeanChangeFrac = 0.025 150 100 0.05 0.1 0.15 Arrival rate (clients/second) [B] Fig. 21. Experimental Results: [A] NEE curves for CBS and VBS hot clients for varying MeanChangeFrac values; [B] NEE curves for CBS and VBS cold clients for varying MeanChangeFrac values 64 A. Datta, D. VanderMeer, A. Celik, V. Kumar Summary. In summary, we note that the update fraction aects NEE through the number of buckets a client must read, rather than the amount of time a client must wait for his DCI. Higher update fractions cause clients to read more buckets, leading to higher NEE values, while lower update fractions lead to lower NEE values. 10.7 Eectiveness of The Ignore Factor Construct We designed our protocols to provide the best possible service for clients interested in hot items while ensuring that all clients are able to download their DCIs at least once during their stays in the cell. This is achieved, by using the Ignore Factor (IF) construct. In order to examine the eectiveness of the IF feature, we tracked the percentage of clients starved throughout their entire length of stay in the cell. Note that in VBS the question of starvation does not arise, as all requested items are included in the broadcast. Consequently, we only explore the issue of starvation with regard to the CBS scenario. Another curious feature that emerged when we started exploring this issue, was that no signicant starvation occurred with the client loads that we have been talking about so far. In order to see signicant client starvation, we had to increase the load on the system far beyond that considered in our other experiments. Thus, the arrival rates shown here represent not only those used in our baseline experiments, but also several signicantly higher loads. 10.7.1 Percentage of clients starved for hot items. Since our protocols were designed to provide preferential service for clients interested in the more popular items, we were not surprised to nd no signicant starvation of clients interested in hot items. 10.7.2 Percentage of clients starved for cold items. We report the results of our starvation study for CBS in gure 22. For brevity, we make two important obser14 Percentage of clients starved 12 CBS_cold, bcastsize = 1060 CBS_cold, bcastsize = 1590 CBS_cold, bcastsize = 2120 10 8 6 4 2 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Arrival Rate (clients/second) 0.4 0.45 0.5 Fig. 22. Experimental Results: Percentage of cold clients starved for CBS vations: Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 65 |For CBS clients interested in cold items, we found that smaller broadcast sizes lead to increased contention for broadcast space, and therefore, to higher levels of client starvation, while larger broadcast sizes have a lower level of contention for broadcast and less client starvation. |While some cold clients are indeed starved, the overall percentage of clients starved is very low; in the worst case, (i.e., for the highest load and smallest broadcast size we tested) only 12.2% of clients interested in cold items were unable to download their DCIs. Recalling that only 20% of clients overall are interested in cold items, we see that, at most, only 2.4% of clients overall are starved, even at the extreme loads we considered in these experiments. 11. CONCLUSION In this paper, we looked at the problem of data retrieval by mobile units from wireless broadcasts. While the problem of data access in broadcasts has been investigated previously, deciding the specic content of the broadcast has been unexplored. Specically, in the presence of mobile clients with diering demands, the determination of the broadcast content is an important problem. The problem addressed in this paper may be captured by the following two questions: |Given that users are highly mobile in their mobility domain, what are good strategies that the server can use to decide on what broadcast content? |Given that good broadcast strategies are found, what are good retrieval algorithms by which users can retrieve/download data from broadcast, with a minimum of energy expenditure? In this work, we presented adaptive protocols for broadcast content determination and information retrieval. Periodicity has been regarded as an important parameter in broadcast-and-retrieve environments. We thus considered both periodic (constant broadcast size (CBS) strategy) and aperiodic (variable broadcast (VBS) size) broadcast strategies. In the constant broadcast size strategy, we xed the length of a broadcast and added or dropped items from the broadcast based on their popularity. We also introduced a metric called Ignore Factor to weigh items on how long requests for them have been ignored so that even if they are not popular enough to be included in the broadcast, the clients requesting them are not starved. In the variable broadcast size strategy, the broadcast size changes according to the number of items requested during the previous broadcast period. Clients are guaranteed to receive items that they requested. Since this strategy can potentially include all the items in the database, we introduced the concept Residence Latency which will drop items from the broadcast based on the expected time that a client requesting an item will stay in the cell. We have performed an approximate yet thorough performance analysis by analytical methods. Our analysis of these strategies suggests that the aperiodic (VBS) strategy outperforms the periodic (CBS) strategy at low system loads. For moderate loads, VBS provides lower tuning times, while CBS provides lower access times. At high loads, CBS outperforms VBS overall, providing lower tuning and access times. 66 A. Datta, D. VanderMeer, A. Celik, V. Kumar Also, our analytical study indicates that the number of index segments in the broadcast is the most important parameter for the VBS. This parameter has the greatest eect on the access time since broadcast size in this strategy is not limited. While informative, our analytical study is subject to certain limitations, e.g., we omitted consideration of Ignore Factor in order to make our analysis mathematically tractable. To overcome these limitations, we developed a detailed cost model for our protocols, and evaluated the performance of our protocols through simulation. In our baseline simulation experiments, we found that our CBS protocols produced signicant energy savings over earlier protocols. We also found that, in general, the aperiodic VBS protocol outperformed the periodic CBS protocol at low loads. At higher loads, however, the CBS protocol predominated. These results conrm the basic results of our analytical study. In terms of eciency, we discovered that, for broadcasts in which clients are interested in only a small portion of the data, trends in Access Time are virtually the same as trends in energy consumption. We noted that the standard metric for energy eciency, Tuning Time, misrepresents energy consumption by ignoring energy use in the sleep state. For this reason, we built a detailed cost model and measured the actual energy expenditure through our Normalized Energy Expenditure metric. In addition to our baseline experiments, we also experimented with varying some key parameters. We noted that increasing MeanResidenceLatency or DatabaseSize caused increased contention for inclusion of items in the broadcast, and therefore, higher NEE values. Decreasing these parameters produced lower NEE values. Conversely, increasing the BroadcastSize decreased contention for space in the broadcast, which produced NEE values which grew very slowly in proportion to load on the system. A smaller BroadcastSize produced NEE values which grew much faster in proportion to load. Varying the MeanChangeFrac parameter showed that, at low loads, a larger number of updates caused larger NEE values, while a smaller number of updates caused lower NEE values. This eect disappeared as load increased. Finally, we considered the percentage of clients starved across varying load levels in order to show the quality of service provided by our protocols. At normal loads, we found no signicant client starvation. At much higher loads, we began to see minor starvation of clients interested in cold items, but no starvation of clients interested in hot items. REFERENCES Acharya, S., Alonso, R., Franklin, M., and Zdonik, S. 1995. Broadcast disks: Data management for asymmetric communication environments. In Proc. ACM SIGMOD (San Jose, California, May 1995), pp. 199{210. Acharya, S., Franklin, M., and Zdonik, S. 1996a. Disseminating updates on broadcast disks. In Proc. 22nd VLDB Conference (Mumbai, India, September 1996), pp. 354{365. Acharya, S., Franklin, M., and Zdonik, S. 1996b. Prefetching from a broadcast disk. In Proc. Intl. Conf. on Data Engg. (New Orleans, Louisiana, February 1996), pp. 276{285. Acharya, S., Franklin, M., and Zdonik, S. 1997. Balancing push and pull for data broadcast. In Proc. ACM SIGMOD (Tucson, Arizona, May 1997), pp. 183{194. Alonso, R. and Korth, H. 1993. Database issues in nomadic computing. In Proceedings of the 1993 ACM-SIGMOD (Washington, DC, May 1993), pp. 388{392. Badrinath, B. 1997. Personal communication. Computer Science Dept., Rutgers University. Badrinath, B. R. and Imielinski, T. 1992. Replication and mobility. In Second Workshop on the Management of Replicated Data (November 1992), pp. 9{12. Broadcast Protocols to Support Ecient Retrieval from Databases by Mobile Users 67 Barbara, D. and Imielinski, T. 1994. Sleepers and workaholics: Caching strategies in mobile environments. In R. Snodgrass and M. Winslett Eds., Proceedings of the 1994 ACM SIGMOD (May 1994), pp. 1{12. Technical guide for lithium ion cell model icr-18650. http://www.polystor.com/.product/tguide.htm. D. Lam, D. C., J. Jannink and Wisdom, J. 1995. Modeling location management in personal communication services. Technical report (October), Stanford University. Datta, A. 1994. Research Issues in Databases for Active Rapidly Changing data Systems (ARCS). ACM SIGMOD RECORD 23, 3 (September), 8{13. Dunham, M. H. and Helal, A. 1995. Mobile computing and databases: Anything new? In SIGMOD Record , Volume 24 (December 1995), pp. 5{9. Association for Computing Machinery. Fishwick, P. 1995. Simulation Model Design And Execution: Building Digital Worlds. Prentice Hall. Forman, H. G. and Zahorjan, J. 1994. The challenges of mobile computing. IEEE Computers 27, 4 (April), 38{47. Franklin, M. 1997. Personal communication. Computer Science Dept., University of Maryland. Huang, Y., Sistla, P., and Wolfson, O. 1994. Data replication for mobile computers. SIGMOD Record 23, 2, 13{24. Imielinski, T. and Badrinath, B. R. 1992. Querying in highly mobile distributed environments. In Proceedings of the 18th International Conference on Very Large Databases (August 1992), pp. 41{52. Imielinski, T. and Badrinath, B. R. 1994. Mobile wireless computing. Communications of the ACM 37, 10 (October), 19{28. Imielinski, T., Vishwanath, S., and Badrinath, B. R. 1997. Data on air: Organization and access. IEEE Transactions on Knowledge and Data Engineering 9, 3 (May/June), 353{372. Imilienski, T., Vishwanathan, S., and Badrinath, B. 1994a. Energy ecient indexing on air. In Proc. ACM SIGMOD (May 1994), pp. 25{36. Imilienski, T., Vishwanathan, S., and Badrinath, B. 1994b. Power ecient ltering of data on air. In Proc. EDBT (March 1994), pp. 245{258. Inc., M. 1997. Personal messenger 100c modem specications. http://www.mot.com/MIMS/WDG/techLibraryDir/specsDir/pm100cSpecs.html. Li, K., Kumpf, R., Horton, P., and Anderson, T. 1993. A quantitative analysis of disk drive power management in portable computers. Technical Report CSD-93-779, Computer Science Division, University of California, Berkeley. Malyan, A. D., Ng, L. J., Leung, V. C., and Donaldson, R. W. 1993. Network architecture and signaling for wireless personal communications. IEEE Journal on Selected Areas in Communications 11, 6 (August), 830{841. NYSE, I. 1997. Nyse listed companies. http://www.nyse.com/public/about/lstcotoc.html. P. Krishna, N. H. V. and Pradhan, D. K. 1994. Location management in distributed mobile environments. In Proceedings of the Conference on Parallel and Distrbuted Information Systems (September 1994), pp. 81{88. Satyanarayanan, M. 1996. Accessing information on demand at any location. mobile information access. IEEE Personal Communications 3, 1, 26{33. Seagate Technology, I. 1997. Marathon family of disc drives. http://www.seagate.com/disc/marathon/marasl.shtml. Shivakumar, N. and Wisdom, J. 1994. User prole replication for faster location lookup in mobile environments. Technical report, Stanford University. V. Anantharam, U. M., M. L. Hong and Wei, V. K. 1994. Optimization of a database hierarchy for mobility tracking in a personal communications network. Performance Evaluation 20, 2 (May), 287{300. Corporation, P. 1997. 68 A. Datta, D. VanderMeer, A. Celik, V. Kumar Vaidya, N. H. and Hameed, S. 1996a. Data broadcast in asymmetric environments. In First International Workshop on Satellite-based Information Services (WOSBIS) (November 1996). Vaidya, N. H. and Hameed, S. 1996b. Scheduling data broadcast in asymmetric communication environments. Technical Report 96-022 (Nov.), Computer Science Department, Texas A&M University, College Station. Wolfson, O. and Jajodia, S. 1992. Distributed algorithms for dynamic replication of data. In Proceedings of the Symposium on Principles of Database Systems (1992), pp. 149{163. Zdonik, S., Franklin, M., Alonso, R., and Acharya, S. 1994. Are "disks in the air" just pie in the sky? In Proc. IEEE Workshop on Mobile Computing Systems and Applications (Santa Cruz, California, December 1994). Appendix Proof of Lemma 1 Let TB = m(I + B ) be the size of the broadcast in the Constant Broadcast Size strategy. For an arbitrary j , P fTw = 12 TB + jTB g = ph (1 , ph;in )j ph;in + pc (1 , pc;in )j pc;in Therefore, 1 X E [Tw ] = ( 21 + j )m(I + B )(ph(1 , ph;in )j ph;in + pc (1 , pc;in )j pc;in ) j =0 Proof of Lemma 2 Pin = ph;in ph +pc;in pc ph;in = b b X + + b )! Y (1 + + b )! Y pi i pc;in + (1 Q pi i Qb b ! ! i i i=1 i=1 i=1 i=1 fB c g fBh g (54) X Proof of Lemma 3 Lemma 3 is based on the scenario where the DCI is in the current broadcast and the client doesn't miss it at all. Scenario 1 is subdivided into sub-scenarios: Scenario TI ), and Scenario 1b 1a where client initially tunes in to an index segment (M; TD). where the client initially tunes in to a data segment (M; TI g = Pin m , 1 P fScenario 1ag = Pin P fM; 2m + S m , 1 S TDg = Pin P fScenario 1bg = P fM; 2m 2m + S Since Scenario 1a and Scenario 1b are mutually exclusive events, P fScenario 1g = P fScenario 1ag + P fScenario 1bg m,1 S ,1 = Pin 2m m + S + Pin 2m 2m + S ,1 m,1 S = Pin ( 2m m + S + 2m 2m + S ) TT1 = 1 + l + S The TT and AT are derived as follows, TT1 = (Client reads the current bucket, gets the address of the next nearest index segment, tunes out) + (Client reads the index tree, tuning in and out along the B-tree, tunes out) + (Client downloads the DCI) = 1+l+S Under Scenario 1a AT1a = (Access time if a client tunes in to an index segment) = (Client tunes in to middle of index segment, waits until next index, reads index, waits until the DCI, downloads it) Under Scenario 1b = I2 + B + I + L1 + S AT1b = (Access time if a client tunes in to a data block) (Client tunes in to a data block, waits until the next index, reads index, waits until the DCI, downloads it) = B2 + I + L1 + S Therefore, E [AT1 jScenario1] = P fScenario 1agAT1a + P fScenario 1bgAT1b m,1 S B ,1 I = Pin 2m m + S ( 2 + B + I + L1 + S ) + Pin 2m 2m + S ( 2 + I + L1 + S ) B S ,1 I = Pin 2m m + S ( 2 + B + I + L1 + S + ( 2 + I + L1 + S ) 2m ) All subsequent lemmas have a similar format to Lemma 3, i.e., they will be concerned with deriving expressions for probabilities of scenarios as well as AT and TT expressions for these scenarios. In the previous lemma we derived expressions for AT and TT . In the subsequent lemmas we omit these derivations. Proof of Lemma 4 Scenario 2 is subdivided into two sub-scenarios: Scenario 2a where client initially tunes in to an index segment (M; TI ),and Scenario 2b where client initially tunes in to a data segment (M; TD). P fScenario 2ag = Pin P fM; TI gPin = Pin2 2mm ,+ 1S ,1 + B ,S) S P fScenario 2bg = Pin P fM; TDgPin = Pin2 ( m2m mB 2m + S Since Scenario 2a and Scenario 2b are mutually exclusive events, P fScenario 2g = P fScenario 2ag + P fScenario 2bg S ,1 2 m,1 B,S = Pin2 2m m + S + Pin ( 2m + mB ) 2m + S ,1 m,1 B ,S S = Pin2 ( 2m m + S + ( 2m + mB ) 2m + S ) Further, AT2a = 32I + B + L2 + m(I 2+ B ) + S AT = B + I + L2 + m(I + B ) + S 2b 2 2 E [AT2 jScenario2] = P fScenario 2agAT2a + P fScenario 2bgAT2b m(I + B ) + S ) + , 1 3I = Pin2 2m m + S ( 2 + B + L2 + 2 , 1 + B , S ) S ( B + I + L2 + m(I + B ) + S ) Pin2 ( m2m mB 2m + S 2 2 3 I m ( I + B ) 1 + S) + = Pin2 2m + S ((m , 1)( 2 + B + L2 + 2 1 B,S B m(I + B ) ( m2, m + mB )S ( 2 + I + L2 + 2 + S )) Proof of Lemma 5 1 B,S S P fScenario 3g = Pin P fM g(1 , Pin ) = Pin (1 , Pin )( 2mm ,+ 1S + ( m2, m + mB ) 2m + S ) Proof of Lemma 6 S S P fScenario 4g = Pin P fPM gPin = Pin2 mB 2m + S Proof of Lemma 7 S S (1 , Pin ) P fScenario 5g = Pin P fPM g(1 , Pin ) = Pin mB 2m + S Proof of Lemma 9 Expected distance from end of index to the DCI, given that the DCI has not been missed is, L1 = E fDistance to DCIjM g = = = m X m X m X m X i=2 j =i E fDistance to DCIjTIi ; Dj gP fTIi gP fDj g ( B2 + (j , i)(I + B )) m12 i=2 j =i m (m , i + 1)B 1 X m2 i=2 ( 2 , i)(I + B ) ) + (m , i + 1)(m 2 1)(m , 2) = B (m4m, 1) + (I + B )(m6, m Proof of Lemma 10 The expected distance from end of index to the next broadcast, given that the DCI has been missed is, L2 = E fDistance to next broadcastjM g = = = m ,1 X m X E fDistance to next broadcastjTIj ; Di gP fTIj gP fDi g i=1 j =i+1 ,1 X m 1 mX m2 i=1 j=i+1(B + (I + B )(m , j )) ,1 (m , i)(m , i , 1) ) 1 mX m2 i=1 ((m , i)B + (I + B ) 2 2m2 , 6m + 3 1 = 2m ((m , 1)B + (I + B ) 6 ) Proof of Lemma 11 P fDCI 2 BRg = ph;in ph + pc;in pc = (1 , (1 , (1 ,bhp0 )ph )N )ph + (1 , (1 , (1b(1, ,p0h)p)c )N )pc Proof of Lemma 13 TI = P fM j TI gP fTI g = m , 1 P fScenario 7g = P fM; 2m + S Proof of Lemma 14 TDg = P fM j TDg = m , 1 S P fTDg P fScenario 8g = P fM; 2m 2m + S Proof of Lemma 15 P fScenario 9g = P fM; TI g = P fM j TI gP fTI g = 2mm ++ 1S Proof of Lemma 16 B , 2S S P fScenario 10g = P fM; TDg = P fM j TDgP fTDg = mB + 2mB 2m + S Proof of Lemma 17 S S P fScenario 11g = P fPM; TDg = P fPM j TDgP fTDg = mB 2m + S Proof of Lemma 18 P fScenario 12g = 1 , P fDCI 2 BRg and substitute the result of Lemma 11