From: Chandrasekar Ramachandran [cramach2@uiuc.edu] Sent: Tuesday, February 26, 2008 9:31 AM To: indy@cs.uiuc.edu Subject: 525 review 02/26 Reviews: 1. TAG: A Tiny Aggregation service for ad-hoc sensor networks, S. Madden, et al, OSDI 2002 Summary: This paper proposes a novel method of aggregating in-network queries, and executing them efficiently in a low-power TinyOS type sensor network.Two different kinds of features-namely a declarative interface for expressing queries and an efficient method of distributing and executing them is provided. The service operates on top of an existing ad-hoc networking protocol and the results are reported back via a routing tree rotted at base station. Simulation results of the TAG show a performance gain over other centralized aggregate approaches. Pros: 1. Performance gains in terms of bandwidth reduction are pronounced over centralized approaches. 2. Neat presentation style from the structure of TAG to the simulation results. 3. User-interactivity separated from low-level code in the system. Cons: 1. TAG could make use of the inherent advantages of the underlying routing protocol. ------- 2. DIFS: A distributed index for features in sensor networks, B. Greenstein et al, SNPA 2003 Summary: This paper proposes a method of index construction and evaluation over high-level semantic data which facilitates the execution of range searches in the network. This provides good load balancing and performs well when the number of queries posed is more relative to the number of events generated in the network. Pros: 1. Better scaling of DIFS as compared to other types such as quad-tree with larger networks. 2. Bottleneck nodes of DIFS are involved in a much smaller fraction of searches as compared to other approaches. Cons: 1. The applicability of the proposed method to situations when there is some failure involved in the network has not been discussed. Common Thread: In-network aggregation and its performance benefits From: qwang26@uiuc.edu Sent: Tuesday, February 26, 2008 9:23 AM To: Gupta, Indranil Subject: 525 review: in-network processing 2/26 DIFS: A distributed index for features in sensor networks In this paper, authors extent data-centric storage architecture to efficiently support range queries—that is, queries where only events with attributes in a certain range are desired. DIFS, their proposed distributed index, provides for low average search and storage communication requirements and seeks to balance these requirements over participating nodes. DIFS, based on GHT, is well-suited to scenarios where the nature of some archetypal high-level events is well defined. In such cases, efficient index structures may be applied that save on communication overhead since only data about high-level events is communicated rather than the lower-level time series data from which events are composed. DIFS extends GHT to support efficient range queries while maintaining balanced load across nodes. DIFS achieves this by constructing a multiply rooted hierarchical index that differs from traditional binary and quaternary trees in that non-root nodes can have multiple parents. Nodes stores event information for a particular range of values detected with a particular geographic region. Higher-level nodes cover smaller value ranges detected within large geographic regions while lower level nodes cover a wider range of values from within a smaller geographic region. DIFS performs best when there are many queries posed relative to the number of events generated, when each result set is small, and when the result set is not likely to be distributed over the network in such a manner as to be discovered by most leaf level search nodes. Compared with the quad tree searching algorithm, although quad tree outperforms DIFS both in terms of the aggregate communication cost of storage and of search, but is not scalable to a large number of searches or stores. DIFS scales well to large-scale networks by using a multiple rooted tree and a geographic/value coverage tradeoff that balance communication overhead over many nodes. TAG: a Tiny Aggregation service for ad-hoc sensor networks This paper describes a generic aggregation service-TAG- for ad hoc networks of TinyOS motes. TAG provides a simple, declarative interface for data collection and aggregation, inspired by selection and aggregation facilities in database query language. In addition, it intelligently distributes and executes aggregation queries in the sensor network in a time and power-efficient manner, and is sensitive to the resource constraints and lossy communication properties of wireless sensor networks. TAG processes aggregates by computing over the data as it flows through the sensors, discarding irrelevant data and combining relevant readings into more compact records when possible. In particular, when users pose aggregation queries from a basestation, queries are distributed into the network by pippybacking on the existing ad hoc network protocol. Then, sensors route data back towards the basestation through a routing tree rooted at the basestation. As data flow ups this tree, it is aggregated according to an aggregation function and value-based partitioning specified in the query. The experiment results in this paper demonstrate the advantages of TAG over traditional centralized, out-of-network methods. Authors also discuss a variety of optimizations for improving the performance and fault-tolerance for the basic solution. From: fariba.mahboobe.khan@gmail.com on behalf of Fariba Khan [fkhan2@uiuc.edu] Sent: Tuesday, February 26, 2008 9:02 AM To: Gupta, Indranil Subject: 525 review 02/26 TAG: A Tiny Aggregation service for ad-hoc sensor networks, S. Madden, et al, OSDI 2002 Summary: TAG is a smart aggregation service for ad-hoc sensor networks where nodes collaboratively execute aggregation queries similar but more complex than SQL. The nodes form a tree in terms of the user interface node that requests the data being the root and other nodes forming a tree in terms of radio strength. Queries have SQL style format with SELECT, aggregation expressions (AVG, SUM, HISTOGRAM), WHERE, GROUP BY, HAVING and EPOCH DURATION. EPOCH DURATION defines how frequently to execute this query. It runs topology discovery protocol continuously and also needs global clock synchronization for the epochs. Discussion: It is not as cool as you read it the first time after you realize they are building on directed diffusion. There contribution is TinyDB not aggregation. One big challenge with aggregation is that there are only few commonly-used ones and not all of them would be quite efficient. Nice Background on motes Asymmetric links are black listed. I found that interesting. Why? To me, it seems this is too complicated, I see blending of thing that can be at application layer into network layer and I am not convinced that this is an interface "any" data collecting sensor node could use. In many cases I think, I would want the raw data. Data is well-presented in aggregation but we never get rid of the raw data after I make a graph / table out of it. Also what if node collects data and has to run the query on the data, how fast would be SQL on a mica node, how much storage? What is the protocol used for data collection now on mica nodes? Do they have special protocol? DIFS: A distributed index for features in sensor networks, B. Greenstein et al, SNPA 2003 Summary: DIFS is (Distributed Index for Features in Sensor Networks) is a spatially distributed index for range searches. DIFS is built on top of Geographic Hash Tree (GHT) and hence provides a load-balanced structure. To summarize very simply GHT replicated data in geographically distributed nodes. Nodes from trees and parents have "higher level" aggregate view. Discussion: I am not sure if DIFS is their contribution or GHT, DIFS both. Looks like, if you once have the idea of GHT DIFS is intuitive. Similar to TAG GHT/DIFS are aiding a class of queries at higher level. Questions like "Did the region that has temp 50F-60F move west and become elongated? In their simulations they count number of hops to query information. I think it is also important to know overhead of using GHT/DIFS. From: rebolledodaniel@gmail.com on behalf of Daniel Rebolledo Samper [dreboll2@uiuc.edu] Sent: Tuesday, February 26, 2008 7:53 AM To: Gupta, Indranil Subject: 525 review 02/26 DIFS: A DISTRIBUTED INDEX FOR FEATURES IN SENSOR NETWORKS This article presents a DIFS, a distributed index for sensor networks. The authors' goal is to extend existing data-centric storage solutions (viz. Geographic Hash Tables) to allow range queries to be efficiently executed without consulting every node. The main idea is they create an extension of quad-trees: each (intermediate) node of level k has b parents of level k+1 and four children of level k. Each node is responsible for keeping a histogram of the values available in its descendants, and leaf nodes store pointers to where the data is actually located. Each of the b parents stores a range of values that is b times narrower, but the geographic extent of said data is four times larger. This allows range queries to be efficiently executed: for a given range, we only need to find the minimum number of nodes responsible for that range to obtain the data. Geographic queries are also made simpler, by selecting from the said nodes only those that are responsible for the regions in question. The system is also load-balanced as no node is singled-out as a bottleneck. The authors present an original data structure: a node has not only multiple descendants but also multiple parents. This additional design dimension is not obvious and may be useful for general (hierarchical) distributed systems design. However, I am not entirely convinced that they achieved one of their stated goals of providing low storage communication overhead and therefore low power consumption. Indeed, while the experimental evaluation shows that their system is very efficient at querying hosts for data that is very unevenly distributed (fig 5c), fig. 6 shows that their system increases message overhead by an order of magnitude. Intuitively, this means that if the sensors produce a lot of data and queries are relatively few, then we're better off using structured replication. While the load-balancing results are impressive, more experiments are required to determine if DIFS is indeed energy-efficient, and more discussion is needed to justify that the "hotspot" scenario is the most prevalent in practical WSN deployments. TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS This paper presents the design and implementation of an SQL-like language to query wireless sensor networks. TAG uses in-network processing extensively to cut communication costs as much as possible and therefore increase battery life. The motivation of this paper is that communication is much more expensive than in-mote processing and therefore any practical application of WSNs will have to cut back on unnecessary communication. The authors observe that the data a supervisor might require from the network can often be expressed as an aggregate function over the instantaneous data or time series of the data generated by the motes. They develop an SQL-like language to express those requests and then leverage some of the results in the area of databases to distribute those queries among the motes in a way that reduces communication costs. For example, to calculate the maximum of a set of values it is not necessary for a mote to transmit all the values. It only needs to transmit the maximum of its value and those of its neighbors (in this case, children, as each query induces a tree structure). Likewise, the qualifiers WHERE and HAVING can be pushed down to the nodes to limit the information they transmit This paper is a very important and creative contribution: it is remarkable that the authors manage to implement a portion of the functionality of SQL in WSNs, and the experimental evaluation shows an order of magnitude improvement when the queries don't involve all of the data at once. However, a more detailed study of the impact on battery life in a realistic scenario would have been appropriate. From: Justin Tulloss [jmtulloss@gmail.com] Sent: Tuesday, February 26, 2008 3:52 AM To: Gupta, Indranil Subject: 525 review 02/26 Hi Indy, Here are my reviews. Thanks! Justin TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks TAG is an interesting approach for aggregating data off of sensor networks. The designers of TAG felt that designing a high level system that allowed users to easily comb data out of a sensor network would not only make sensor networks more approachable to those that were not scientists, but also enable the lower levels of the system to adapt itself to more pressing needs of the network without interfering with the users' intentions. They designed TAG with sensor networks in mind, and as such they heavily factor in number of message transmissions and fault tolerance. They manage to reduce message transmissions by realizing that computation is much cheaper than aggregation, so whenever a node has enough knowledge to aggregate a value on its own, it does so and sends the aggregated value back. This decentralized approach drastically reduces network traffic. This, along with the high level nature of the query language, was the main goal, and they demonstrated that the system works as well or better than a centralized system in numbers of transmitted messages for every type of aggregation. I liked this paper. I like elegantly designed interfaces that are shaped around the capabilities of the underlying system while not exposing the unnecessary dirtiness of that system to the user. I feel like TAG does a brilliant job of this. They expose the data in a way that makes sense in sensor networks. This point is especially driven home by their inclusion of the epoch semantics. Epochs have little value to users in most systems, but in sensor networks, the epoch means a lot as data is usually being collected continuously. I liked the power of the interface and the way that the well designed interface mapped so perfectly on the performance characteristics they were trying to achieve. DIFS: A Distributed Index for Features in Sensor Networks DIFS is a method of indexing sensor networks to allow for more efficient and fault tolerant message distribution. It does this by taking two common distribution techniques, quad trees and geographic hash tables, and essentially making a hybrid of the two. This of course means that the resulting network is somewhere inbetween GHTs and quad trees performance wise, but it offers a nice compromise between the two when it comes to fault tolerance and speed. It usually closely follows the stronger algorithm for whatever metric you might be using to rate the scheme, which seems to define a good compromise. DIFS seems like a pretty good system. As I said above, it seems like a good compromise between GHTs and quad trees. At the same time, it isn't resoundingly different from either of them while seemingly increasing complexity quite a bit. I'm not sure that without more substantial improvements in performance this would be a scheme worth pursuing over either of the others. From: hossein.ahmadi@gmail.com on behalf of Hossein Ahmadi [hahmadi2@uiuc.edu] Sent: Tuesday, February 26, 2008 3:41 AM To: indy@cs.uiuc.edu Subject: 525 review 02/26 =============================================================================================== TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong One of the ways to save both energy and communication resources in wireless sensor network is to process sensor data at intermediate nodes and transmit only the aggregate values to the base station. This paper presents such aggregation service for sensor networks using an SQL-like query system. Authors first define their query structure and then, they describe how queries are propagated and results are delivered back. Each query asks for a periodic report of an aggregate value in the network such as average, sum, count, or minimum. The base station broadcasts the query. Based on each hop the query travels, a routing tree back to the base station is formed. Leaf nodes generate requested data and transmit it to their parents. The parents wait for a time interval to collect all the data from the children; then, they compute the aggregate value using the appropriate function (e.g. sum). At each layer, nodes spend some time to collect values from the children, merge them, and send them to their parents. The time interval at which these operations are done depends on the application and deployment of nodes. Nodes closer to leaves have to start their collection interval sooner while nodes closer to the root wait for packets to propagate through several layers. Next, the paper simulates TAG in a loss free environment and presents the savings in terms of number of packets. Considering case of realistic links, the paper proposes some approaches to overcome the effect. One is to change a nodes parent when a very poor connection is detected. The other solutions consist of using multiple parents or caching previously received data in the event of packet loss. Currently, aggregation is one the most important aspects of sensor network make it distinguished from other network types. This paper concretely analyzes different types of aggregate functions and proposes a simple, yet effective and reliable aggregate service. However, TAG depends on one key tuning parameter: the timing for collecting and aggregating data. In a dynamically changing routing tree, it is expected that the number of children of each node and its height in the tree changes by time. Therefore, there should be a mechanism to adaptively tune timing according to those parameters. On the other hand, someone can employ retransmission in MAC layer to achieve better link delivery rate; however, it makes the MAC latency variable in TAG's point of view and consequently, the timing of packet collection harder. ----------------------------------------------------------------------------------------------- DIFS: A Distributed Index for Features in Sensor Networks Benjamin Greenstein, Deborah Estrin, Ramesh Govindan, Sylvia Ratnasamy, and Scott Shenker There have been two approaches proposed for aggregated communication in wireless sensor network. One is to use an aggregation tree rooted at the query initiator, and collect the reply all the way toward the root. The other way is to store aggregate values somewhere in the network; so, the query can easily find the aggregate reply and fetch it. Since the query needs to be flooded in the first approach, it consumed a lot of network resources. On the other hand, the second approach achieves better load balancing and energy consumption while it can only be used for queries. This paper proposes an extension to the second scheme which can handle range queries, i.e. searching by the values of data not only the key. The main idea is to construct a mutli-parent tree at which higher level nodes have the summary of values over larger area while lower level nodes keeps track of only a subarea of the parents. On the other hand, the higher level nodes store a limited range of values while each lower level nodes store the union of ranges being stored by its parents. DIFS divides area into quadrangles where there is a node storing index for that area. Each node would have tree children for three sub-quadrangles of the specified area. The node inside the quadrangle is selected by hashing the key of data as well as the geographical boundary of nodes where data is provided. In order to start a search, the query is passed downwards till it reaches the appropriate range node. Then, it is forwarded up where the range is being tightened. After that, the specific region at which the query should be forwarded can be found. Authors performed simulations in order to compare DIFS against structured replication as well as quad tree and directed diffusion. The paper shows that the cost of searching can be reduced since DIFS is the only method in which there is no need to forward the query to all nodes. On the other hand, in order to store data and index them there is a lot more nodes to be updated in DIFS and consequently, more energy consumed. The solution proposed in the paper seems elegant but the overhead of maintaining the index is one key point that should be studied in order to decide whether this approach is effective or not. The simulation results presents performance of DIFS in search and storage phases separately. However, it is not clear if the total communication cost in two phases have been reduced or not. =============================================================================================== From: Alejandro Gutierrez [agutie01@gmail.com] Sent: Tuesday, February 26, 2008 1:08 AM To: Gupta, Indranil Subject: 525 review 02/26 =============================================================== “DIFS: A Distributed Index for Features in Sensor Networks” by: Benjamin Greenstein, Deborah Estrin, Ramesh Govindan, Sylvia Ratnasamy, and Scott Shenker Reviewed by: ALEJANDRO GUTIERREZ =============================================================== This paper presents a Distributed Index for Features in Sensor networks (DIFS), a novel technique, proposed by researchers from the Department of Computer Science in University of Southern California, the International Computer Science Institute and Intel Research at Berkley. The authors present a scalable, energy-efficient and load balanced technique for range searches taking into consideration attributes in sensor networks. In the first part of the paper authors pay close attention to classify events and related properties. Afterwards they proceed to design DIFS. The hierarchical design of DIFS is very similar to a quad-tree; an interesting fact is that it combines older techniques. The authors take a lot of careful explanation on each of the components of DIFS, especially the operations that take place. The system presents a new approach in the design space of data-centric storage by carefully balancing space and the cost of executing each query. I would have liked more analysis of the results, because from what the authors present, they seem to be very promising. Advantages: - Performance wise the system behaves better with a bigger amount of queries than events, which might indicate promising ground for scalability. Disadvantages: - Hypothetically DIFS can be very inefficient when handling queries over larges ranges, which is a fact I think the authors should take into consideration in order to guarantee scalability of their systems. =============================================================== “TAG: a Tiny Aggregation Service for Ad-hoc Sensor Networks” by: Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong Reviewed by: ALEJANDRO GUTIERREZ =============================================================== This paper presents an in-network aggregation service for sensor networks called TAG developed by researchers at Intel Research Labs in Berkley. TAG extracts data from wireless sensor networks through a simple to use interface. One of the services it offers is retrieving data through a simple interface. On the other hand it also distributes and processes queries over the collected data in an optimal way. One fact I like is how the interface uses SQL to manipulate its database operations. I personally think this makes the system highly adaptable for future work. They are capable of using data aggregates as well as classifying aggregate operators supported by SQL. All of the nodes in the network have a catalog of attributes they support. As described previously the applications work with a query distribution and collection phase. The results they present in the paper lead the reader to believe that TAG optimizes communication cost as well as power consumption. Advantages: - Non-technical users can easily interact with the wireless sensor networks by using a declarative query interface (such as SQL). - The results they obtained show significant optimizations in bandwidth and power consumption. - Nodes have sleep and transmission cycles which helps the authors save power and optimize bandwidth usage. - The nodes within the network have good fault tolerance. Disadvantages: - Their implementation only considers periodic applications it does not take into account application operating at a constant rate. - The results they obtained are not necessarily always accurate. - The caching mechanism they use can affect the outcome of the system. From: dkassa2@uiuc.edu Sent: Tuesday, February 26, 2008 12:59 AM To: indy@cs.uiuc.edu Subject: 525 review 02/26 Rewiew 7: Paper Title: TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks A generic aggregation service for ad hoc networks called Tiny AGgregation (TAG is developed. TAG provides a simple, declarative interface for data collection and aggregation. TAG is a simple, SQL-like declarative language for expressing aggregation queries over streaming sensor data. It distributes and executes aggregation queries in the sensor network. The paper shows that TAG yields an order of magnitude reduction in communication compared to centralized approaches. Some optimizations to improve the performance of TAG are also given. The paper gives a detailed presentation of the TAG algorithm. However TAG may suffer some scalability issues as the tree grows and as each mote rebroadcasts messages inserting its own ID and level. The overhead for constant topology maintenance and communication may dominate the sensor network. ==================================================== Review 8: Paper Title: DIFS: A Distributed Index for Features in Sensor Networks DIFS extends Geographic Hash Table (GHT) which is a key/value-based distributed index to support efficient range queries while maintaining balanced load across nodes. To achieve this DIFS constructs a multiply rooted hierarchical index that differs from traditional binary and quaternary trees in that non-root nodes can have multiple parents. Nodes store event information for a particular range of values detected within a particular geographic region. Higher-level nodes cover smaller value ranges detected within large geographic regions while lower level nodes cover a wider range of values from within a smaller geographic region. The key idea behind the construction of this hierarchy is to incorporate (in addition to the event name) the value of an event, as well the location of the detecting node in determining the storage node for that event occurrence. Using this index, DIFS can efficiently support range queries, queries related to the distributions of values in space and the like. A quad tree where each parent has four children one for each quadrant of the area the parent covers is included in the performance evaluation of DIFS to serve as an example of a conventional hierarchical approach. A structured replication method is also covered. DIFS scales well to large-scale networks by using a multiply rooted tree and a geography/value coverage tradeoff that balances communication overhead over many nodes. The paper explains DIFS in detail. I have no major negative comment against the paper. ============================= From: marefin2@uiuc.edu Sent: Monday, February 25, 2008 11:22 PM To: Gupta, Indranil Subject: 525 review 02/26 REVIEW1: DIFS: A Distributed Index for Features in Sensor Networks B. Greenstein, D. Estrin, R. Govindan, S. Ratnasamy, S. Shenker DIFS provides a structure for efficient range queries. It is based on GHT and quad tree. It divides the whole network into several quadrants. Higher-level nodes cover smaller value ranges within large region while the lower level nodes cover larger value range within small region. Each index node can have multiple parents. Nodes store information for a particular range of values detected within a particular geographic location. To build the hierarchy, a node storing an even forwards the information first to the local index node with the narrowest spatial coverage but covering the widest value range. This index node then forwards a histogram describing the values it has seen to a node with wider spatial coverage but narrower value range and so on. The convention by which index nodes are selected is similar to that of GHT. The range of values a child maintains in its histogram is bfact (=2i, where i >= 1) times the range of values maintained by its parents. This allows queries to be served from any node in the structure. Query by value starts from any node and propagates to the parent nodes up to the root depending on the value. Also DIFS can handle location criteria specified by a bounding box. Query then propagates to those index nodes that cover the any part of interest range. Also each index node in DIFS stores count of values below and above its own histogram ranges. It allows to response query by distribution like “What is the minimum value in the top ten percent of values?”. The deletion of high-level events can be done explicitly or with timeout. Another way is to use search-initiated deletion process, the delete message for a specific range of values will be propagated like the search message and delete the pointers accordingly on the node. Some of the mentionable points (pros and cons) of this system · Load balancing in maintained by giving smaller range of values to the nodes those keep higher range of geographical location and giving larger range of values to the nodes those keep small geographical location. . It also provides efficient support for range queries, queries about distribution of value and queries over specific space. . DIFS is scalable to a large number of searches or stores because queries need not to traverse the root every time and query and insertion can be initiated from any node in the tree. This feature makes DIFS better than quad tree structure. . For attributes those have dynamic nature of distribution over time are not candidate for load balancing in DIFS. . The system doesn’t include any measure or direction about node failure. Actually there is not much information about robustness in the paper. . Also it assumes the reliable transmission, which is not acceptable in wireless network. It is not sure how the system responses in case of data lose. The future work of this paper can be interesting, it can provide provision for multi-attributes and composite queries. REVIEW2: TAG: A TINY Aggregation Service for Ad-Hoc Sensor Networks S. Madden, M.J. Franklin, J. Hellerstein, W.Hong TAG provides a protocol for aggregate queries in low-power distributed wireless environment. It uses SQL-style query syntax for any queries. Queries are forwarded by broadcast. Any sensor node that receives the query for the first time, selects the sender as its parent. It then forwards the query request by broadcasting. Broadcasting ensures that all sensor nodes will eventually get the query request. An epoch is attached with each request. When a node forwards a request, it attaches the epoch time slightly less than the time it receives from its parent. This is because of some processing time in node. This phase is called distribution phase. In the collection phase, each node collects the event it observes from the environment for the query reply and aggregates it with the reply that it receives from its children and then passes it to its parent. Thus aggregated message is forwarded to the root node that initializes the query. Some modification can be done by taking some aggregate (MAX, MIN etc.) decision in the intermediate node and dropping the unnecessary messages. Also TAG supports group by query by keeping each group of information separately and passing it to the parent on each epoch interval. Some of the pros and cons of TAG that I have noticed are given below: · In TAG, each mote is required to transfer a single message per epoch, regardless of its depth in the routing tree, but in centralized approach motes at the higher level needs to send higher number of messages than that of leaf nodes. So in centralized approach, power consumption is not uniform and higher-level motes die out early. · Another advantage that TAG provide is that, it divides time into epoch interval, so sensor motes can be in deep sleep most of the time while saving energy. · TAG provides ability to tolerate disconnections and loss. Each mote maintains a neighbor table. When it detects that its parent has failed (not getting any response for a certain period of time), it selects another neighbor from the table as its parent. Also a node can change its parent by observing the communication quality with its parents. · In some case, TAG uses caching of partial states. When new values are not available due to lost child messages or failure of node, the parent node uses these old values in some manner to predict the output. · When a node fails (for any reason), the tree structure becomes disconnected. TAG requires some time to reconstruct the tree. As the query in sensor network provides output in a regular interval, this disconnection can cause loss of some information. This paper doesn’t consider pipelining of the communication interval, though the authors have considered it in their other work. The novelty of this approach is that it provides a simple solution that provides interface for energy efficient aggregation queries in sensor network. From: ysarwar@gmail.com on behalf of Yusuf Sarwar [mduddin2@uiuc.edu] Sent: Monday, February 25, 2008 10:53 PM To: Gupta, Indranil Subject: 525 review 02/26 DIFS: A distributed index for features in sensor networks Benjamin Greenstein, Deborah Estrin, Ramesh Govindan, Sylvia Ratnasamy, Scott Shenker The paper presents an indexing framework for making range queries over time series data in sensor network. Based on GHT, DIFS uses a multi-parent quad tree with clever range assignment in nodes such that any node initiating a query can quickly locate the nodes those are storing data in the range queried. As like GHT, data items in DIFS are hashed (indexed by some 'key') to a geographic point and nodes who reside closer to that point store those data items. The quad tree is built in a way that leaf nodes store events with wider value range but with small geographic coverage, whereas parents and grand-parents store data in smaller range, but for wider spatial coverage. Pros: - Multi-parent quad tree eliminates root to become bottleneck of all queries initiated by nodes. - Supports queries of various types: by value, over space, and by distribution.. Cons: - I am not pretty comfortable the way it's written... particularly the impact of bfact on quad tree formation and its role in query propagation is quite hazy and scattered..., an illustrative example could be of great help. ======================= TAG: A tiny aggregation service for ad-hoc sensor networks Samuel Madden, Michael Franklin, Joseph Hellerestein, Wei Hong, OSDI 2002 The paper presents a new kind of abstraction of sensor network operation, and formulates its operation as some SQL like aggregate queries on sensor nodes. It comes with the following features: - TAG proposes a SQL like query statement for data gathering in sensor networks. Sensor application running on top of the deployed network will periodically issue queries to the network, and the network will generate associated response against the query. The authors adopt most used SQL statement to formulate queries. Queries are made in SQL, and in network communication system renders this query into necessary message passing system, putting some sort of abstraction to the application. The entire query execution remains quite transparent to the end programming, or user. - TAG brings a new SQL clause EPOCH DURATION. Unlike database SQL query which runs a query just once, TAG query is run iteratively on the same statement each time generating temporal look of the whole system. It's something like a time series observation of the system and EPOCH DURATION specifies this cycle. - TAG supports MIN, MAX, COUNT, AVG, MEDIAN, HISTOGRAM, DISTINCT COUNT aggregation functions. - When data is delivered to the sink, in network aggregation on partial states is done. Each node when delivering its data to the query results, evaluates whether its contribution changes the final result, if not it does not make response. Thus message size and number of message to form the final result is dramatically reduced. - TAG uses simple tree-based routing protocol for data delivery. Leaves are the first nodes to initiate the query result, and they send their data to their parent. When parent gets all response from its children, send its partial aggregate to its parent, and it goes on. Nodes are given some disjoint time slots depending on their depth in the tree by which they are bound to generate response to the query so that the entire query can be answer by the deadline specified by the epoch duration. - TAG uses some channel optimization techniques like snooping, multiple parents and hypothesis testing for improved performance. Cons: - Although, TAG tries to aggregate results based on some aggregation function, it floods the entire with the initial query.... - TAG assumes sensor network operation as an aggregation query on relational database, but there may be applications where this paradigm may not hold... like coordinated sensing or coordinated tracking of moving objects or environmental events, where the query cannot be formulated as SQL like statement to address the entire network... Comments: - Based on TAG, the authors later developed a programming abstraction for sensor network, called tinyDB. ============================ From: Zixia Huang [zhuang21@uiuc.edu] Sent: Monday, February 25, 2008 10:46 PM To: indy@cs.uiuc.edu Subject: 525 review 02/26 Paper Title: DIFS: A Distributed Index for Features in Sensor Networks Author: Benjamin Greenstein, et al Summary: The main contribution of this paper is to present the design, analysis and numerical simulations of a distributed index than can provide for efficient index construction and range searches. It adopts the governing property to provide a load-balanced communication. The distributed index is built on top of geographic hash table. Simulation results show that under the conditions that there are many queries and each result set is small, DIFS performs best. The paper also discusses that the quad paper is not scalable to a large number of searches and stores though it is better than DIFS in terms of the aggregate communication cost of storage and search. Pros: (1) Discuss the situations that DIFS is better than the quad tree. (2) Detailed illustrations of background. (3) Good for scalability. Cons: (1) Aggregate communication cost of storage and search. Paper Title: TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks Author: Samuel Madden and Wei Hong, et al Summary: The main contribution of this paper is to present the Tiny AGgregation (TAG) service for distributed wireless environment where low-power is favored. Different gerneric properties of aggregates are illustrated to demonstrate their influence over performance. Results show the advantages of using TAG over the traditional centralized out-of-network methods, which leads to an order of magnitude reduction in bandwidth consumpation. Pros: (1) Discussion of optimization and improving tolerance to loss (2) Bandwidth consumption reduction Cons: (1) Interface increases the complexity of wireless sensor system From: Hengzhi Zhong [hzhong@uiuc.edu] Sent: Monday, February 25, 2008 8:36 PM To: Gupta, Indranil Subject: 525 review 02/26 Hengzhi Zhong TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Summary: Power consumption may be dominated by send/receive messages, so it’s better to minimize this. One situation is in query answering. Answers can be aggregated before sending back to the sink nodes. That is, aggregation is pushed down the network. TAG recognizes this and provides a high-level programming abstraction for doing aggregation. The programming interface is similar to SQL. Several techniques for ensuring aggregation accuracies are proposed for certain classes of aggregates: snooping and hypothesis testing. Snooping seems to perform better. Pros: 1. provides a SQL-like interface 2. aggregates have a taxonomy and can be optimized according to their properties 3. TAG can use multiple paths to return answers. It does so by assuming each mote has 2 parents. 4. saves energy since the idling process is known Cons: 1. Is it possible to have a very deep network? The Tiny Aggregation algorithm for subdividing epoch doesn’t work well for deep trees. 2. TAG performance depends on network topology. DIFS: A Distributed Index for Features in Sensor Networks Summary: The paper uses a data-centric approach to avoid queries flooding. It uses a data-centric storage architecture to support range queries and uses DIFS, an index structure for search. DIFS combines GHT and a quad tree. DIFS uses hashing for its hierarchical structure and stores histograms at each internal node in the hierarchy. The key idea is to include event name, the value of the event, and the location of the detecting node for determining the storage node for the event. It achieves efficiency by using histograms and avoids load problems (such as the root node bottleneck) by using a multi-parent hierarchy. Pros: 1. lowers overall search cost (as long as you don’t count the tree construction) 2. maintains balanced load across nodes Cons: 1. higher storage-related costs 2. assume the range of the data value is known in advance. 3. assume sensor networks must be rectangular. This is not always true. 4. can’t deal with local conditions as seen in the analysis when the data value distribution is uniform 5. DIFS is designed for very specific scenarios (many queries relative to stores). How often does this scenario occur? Are these scenarios significant? From: Rahul Malik [rmalik4@uiuc.edu] Sent: Monday, February 25, 2008 8:12 PM To: Gupta, Indranil Subject: 525 review 02/26 DIFS: A Distributed Index for Features in Sensor Networks SUMMARY: In this paper, authors have proposed a distributed index for features in sensor networks. The main goal is to perform a search over semantically rich high-level events. In order to perform that, they have mentioned an algorithm that leads to efficient distributed index creation and leads to range queries being performed on that index. Previous approaches that were proposed were only for a single binary value, however range queries leads to much more robustness to system. Their approach uses GHT and quad-trees and improves upon their limitations. In a GHT, events are named with keys and a key is hashed to a geographic position. When a key has many events, then structured replication is used to distribute the data. In quad-tree structure, each index node maintains histogram describing distribution of four equally sized network quadrants. In DIFS, the key idea is to incorporate event value and location of detecting node in determining storage node for event occurrence. The high! er! level nodes stores small range of values detected within large geographic area whereas the lower level nodes stores wide range of values detected within small geographic region. PROS: One of the advantages of DIFS over traditional data-centric approaches is that range-queries can be performed on it. Due to structured replication in DIFS, every node is explored and the query load is greatly distributed and hence there is no bottleneck here. The overall search cost in DIFS is lower. DIFS performs best when there are many queries posed relative to the number of events generated, when each result set is small and when the result set is not likely to be distributed over the network in such a manner as to be discovered by most leaf level search nodes. CONS: DIFS does not perform well when there are more stores and less queries. It incurs a great storage-related cost than non-indexing techniques. Also, they have assumed that all the transactions are reliable and no packets are dropped and nodes are failure prone. This is not at all true in sensor networks where many packets related errors take place. Also, they have just described the index creation on one attribute and have not specified how to do so on multiple attributes. Also, they have just provided a simulation framework, but not implemented the system and tested on a real test-bed. FUTURE WORK: In future, I think that they should try to look at node failures as well as communication errors as they are also concerns in sensor networks and not just energy. They should also implement it on a real test-bed and look at system performance. TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks SUMMARY: In this paper, authors have described an aggregation service for ad-hoc sensor networks. The main motivation behind TAG is that aggregation should be a core service in sensor networks, rather than being an extensible API. The reason for this is that power is central issue in ad-hoc networks and as a result, in-network processing can be performed much more efficiently then transmission. So, in the paper, they have described a simple SQL style interface for data aggregation and collection. Their scheme intelligently distributes and executes aggregation queries in the sensor network in power as well as time efficient manner. The algorithm described is as follows. The users pose the aggregation queries from a base station that propagates through a routing tree. They implement aggregate via three functions, merger function, initializer and evaluator. The time is divided into epoch and in each epoch, sensors route local and children data back to the user using routing tree. As the! d! ata flows up the tree, it is aggregated according to the aggregation function and the value based partitioning specified in the original query. PROS: The system proposed is a very helpful one in sensor networks because aggregation is one of the main requirements of a sensor node. The SQL style interface described by the system makes networking tasking easier for the user who does not have to modify low level code or worry about topology, routing and loss tolerance. The current system leads to substantial reduction in bandwidth and power consumption. Also, it performs better than most centralized schemes due to aggregation. CONS: One of the problems in the current system is that the number of groups can exceed the total storage on any device. The current memory eviction scheme that they proposed, the best and worst schemes had only a little difference. The eviction schemes described in the paper are not very strong ones. FUTURE WORK: In future, I think that they should implement a better memory eviction scheme as I think that the current proposed scheme is not very strong. From: emenese2@uiuc.edu Sent: Monday, February 25, 2008 4:10 PM To: Gupta, Indranil Subject: 525 review 02/25 Paper: TAG: a Tiny Aggregation Service for Ad Hoc Sensor Networks Reviewer: Esteban Meneses (emenese2@uiuc.edu) One of the typical applications in a sensor network is to aggregate the sensed data into one single point. This mechanism provide an opportunity to perform in-network processing (the data is processed inside the sensor network itself instead of other external node) as well as to save energy consumption. TAG is a generic service to aggregate data in a sensor network. It provides a declarative interface for data collection and aggregation, derived from the selection facilities in SQL. Also, TAG is able to distribute and execute aggregation queries in the sensor network considering time and energy consumption issues. The way it works is by letting the users to pose queries from a powered basestation. The operators that actually implement the query are distributed in the network by piggybacking over the existing networking protocol. A tree routed in the basestation is used to route data. As data is sent up in that tree, it is aggregated according to an aggregation function and value-based partitioning specified in the query. One important point is that TAG is oblivious to the routing algorithm used in the network. The selection of SQL corresponds to the need in offering a simple and expressive language to non-technical users. These people at the basestation will execute queries over a table called sensors and with a known schema. The SELECT queries used in TAG have the following components: WHERE, GROUP BY, HAVING and EPOCH DURATION. With exception of the last clause, all the other parts of the query follow the typical SQL semantics. The duration of each epoch specifies the amount of time devices wait before acquiring and transmitting each successive sample. The authors also provided a classification of the aggregates according to four properties. The first dimension is duplicate sensitivity. Duplicate insensitive aggregates can deal with duplicate readings (think of the max of a set of values). The second dimension is whether the aggregate is exemplary (returns one or more representative values from all the set of values) or summary (compute a property over all the values). The third dimension tests when the aggregate is monotonic (the aggregation of two values forms either strictly increasing or strictly decreasing). The last dimension deals with the space required for each partial record. TAG uses in-network processing for answering a query. This provides an effective mechanism for saving energy and providing a reasonable response time. Also, the use of SQL as the query language makes it more likely to be adopted by non-technical communities. The expressiveness of this language is likely to be enough for most of the applications of sensor networks. TAG is able to tolerate disconnections and loss. As the nodes are constantly beaconing to update the topology, it makes the network more reliable to unexpected changes. Other advantage of TAG is that in every epoch each node is supposed to send only one message independently of his depth in the tree. By dividing the time in epochs, TAG offers a convenient mechanism for idling the processor and thus saving energy. The main drawback with TAG resides in the fact that node failure resilience is not provided while the query is executed. It is not clear what happens with a failed node that is the root of several other nodes. In comparison with directed diffusion, TAG offers a more expressive language for the queries. I wonder if there exists any other data structure, other that the simple spanning tree that can bear several node disconnections and still offering a connected structure. Some kind of “hyper tree” that offers this property would be useful for minimizing the rate of broadcasting the topology messages. As TAG obtains values of a time series, it provides a means for some kind of datawarehousing. I wonder whether it would be possible also to do some data mining processing in-network with all this data. How is a query aborted? I didn't understand the way this mechanism can work in TAG. I am not convinced that TAG can be completely agnostic of the underlying routing protocol. It seems a good argument for end-to-end communication, but perhaps several optimizations are obscured by this assumption. This also implies that the energy consumption issue is not well addressed in the paper, neither by an analytic study nor by experimentation. On the other hand, I don't believe that the epoch interval must be a constant. I would rather prefer it to be dynamic, as the circumstances of the environment dictate how frequent I should be sampling. For example, think of a sensor network that measures the volume of vehicles in a city. It certainly should sample more quickly at rush hours, giving that mobility is higher at that time. For me, caching experiments must be directly related to the mobility of the nodes in the network. It is unrealistic to separate both, given that several measurements can be really stale if nodes move frequently. Paper: DIFS: A Distributed Index for Features in Sensor Networks Reviewer: Esteban Meneses (emenese2@uiuc.edu) DIFS is similar to TAG in that it diffuses queries into a sensor network, but provides high level information according to semantic specifications. The main idea resides in the fact that nodes produce time series data. That data can generate high level events after several patter recognition techniques are applied to it in a local fashion. Then, information about attributes of these events (which are locally stores) is obtained and it is stored into indices. Finally, a human or other intelligent agent can pose queries to such indices. Nodes have two functions: they can store raw time series and in some special cases they can also server as index node to facilitate search. DIFS is based in a couple of important concepts: GHT (Geographical Hash Table) and Quad Trees. A GHT provides a distributed index based on key-value associations. Events are named with keys, the storage and retrieval of an event are performed using these keys. In a GHT a key is associated with a geographical position. By hashing keys, a GHT spreads the load due to different keys evenly throughout the sensor network. The structure replication mechanism is also incorporated in the GHT (it uses hierarchical decomposition of the geographic space). The problem with GHT strives in the fact that keys are event names and ranges queries over the values associated with an event are not efficiently supported. For that matter, a spatially distributed quad tree of histograms is used to index values in a sensor network. The quad trees offer an efficient searching, given that histograms can redirect queries to the relevant nodes. DIFS then extends the concept of GHTs to support efficient range queries while keeping load balancing across nodes. DIFS always maintain a rooted hierarchical index that differs from traditional binary and quaternary trees in that non-root nodes can have multiple parents. Nodes store event information for a particular range of values detected within a particular geographic region. By having this mechanism, DIFS can support range queries related to the distribution of values in space. DIFS builds a hierarchy of histograms to guide the search for queries. It is based in the following principle: the wider the spatial extent an index knows about, the more constrained the value range it covers. DIFS permits to query the sensor network by value range, over space and also by distribution. The main advantage of DIFS over other distributed indexes is that is builds a reliable structure for keeping track of the ranges in the queries. The quad tree is more resilient to node failures than a simple tree but it also supposes more overhead in maintaining it. Another advantage of DIFS is the flexible type of queries a user can pose. The range query is reasonably useful for many applications, but DIFS also provides a means for distribution queries that are more representative in certain contexts. I think DIFS covers many of the important information needs for a user of a sensor network. I wonder if it is possible to extend the potential of DIFS by adding another data mining techniques. For example, an alarm event can be executed when the distribution of certain values follows a particular pattern. More specifically, an alert can be raised when the CO2 measurements for certain region in a mine don't follow an expected curve during the day. I was not convinced about the low cost of keeping all the quad tree. It seems that keeping up to date all the topological information is more expensive in the case of DIFS. Also, the energy consumption was not seriously addressed in the paper. This is the initial phrase of the paper, but the authors loose it along the paper. From: Anthony Cozzie [acozzie@gmail.com] Sent: Sunday, February 24, 2008 7:44 PM To: Gupta, Indranil Subject: 525 review 02/26 Both of the papers for this week are concerned with getting data out of sensor networks, while optimizing for latency, power, and reliability. TAG TAG makes two contributions: an SQL like query language for sensor net queries (since the sensor nodes effectively form a database of the information they have obtained) and a method for in-network processing in order to get the information out of the network cheaply. I am not convinced of the usefulness of the query language - this may be an example of a bit of overgeneralization, but it certainly isn't harmful. The basic idea of the in-network processing is that some common operations (like max, average, sum, etc) can be aggregated in the network to reduce the amount of traffic. In other words, if a node gets a query for sum and receives 10 responses, it can forward one message with the sum of those 10 responses, thus reducing the amount of network traffic by d, where d is the average distance of a sensor node to the sink. From a technical standpoint, this is not completely trivial, because of the inherent multicast capability of a wireless link. For example, simply forwarding all transmissions heard back to the sink could lead to duplication if a transmission is received by multiple forwarding nodes (probable). There are also timing issues: parents must wait long enough to hear from all their children. TAG solves these problems with a simple epoch system. Evaluation: seems like a solid paper, although not incredibly imaginative. Probably the only way to improve on TAG is to try to design a more efficient means of aggregation, but it's difficult to see how any order-of-magnitude improvements are possible. Another interesting thought would be probabilistic variants of some unaggregateable metrics. I'm not sure if this even makes sense. Time-indexed sensor networks The basic idea is to store the data in the network on RP nodes. A group of RPs represents a dominating set (a slightly cheaper version of vertex cover) for the network, so each message is forwarded only one hop. Since the data is time-indexed, there is a corresponding time-index set of sets of RPs. The RPs form an overlay network and have queries routed to them. Since min vertex cover is NP-hard (actually one of the "original" NP-hard problems in Karp's paper) finding the optimal assignments is also NP-hard, not to mention the amount of messages that would need to be exchanged for this to happen. Despite a large evaluation section, I still wasn't convinced about the actual utility of the algorithm, because only one graph is devoted to the overall performance of the algorithm (network lifetime), while most are devoted to proving that their Newton's method provides a reasonably good solution for the sets of RPs. It is very obvious that the performance of local storage, external storage, and RP storage depends *greatly* on the relative frequency of data samples and queries. If data sample frequence is hundreds of times greater than query frequency, local storage will be best, while if the reverse is true external storage will be best. RP storage *may* be the best in the middle (although I'm *still* not convinced of this) but the question is how wide is the middle, and how do those ratios compare to real world apps (although I admit these numbers are difficult to get). Second, I suspect that a large amount of the gain from RP sensing comes from leaving the non-RP nodes idle, and I am not convinced this is quite a fair comparison (there are plenty of ways to manage mostly-idle-nodes). I guess I will get my questions here answered on Tuesday :)