From: Long Hai Vu Sent: Tuesday, February 21, 2006 9:43 AM To: Indranil Gupta Cc: Long Vu Subject: CS598ig review 02/21 Class: CS598IG-Spring 06 Date: 02/21/2006 Name: Long Vu REVIEW PAPERS TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks In this paper, authors present the idea of Tiny Aggregation service in low-power, distributed, wireless environments. This service is considered a core service provided by system software rather than a specific mechanism programmed into hardware device. The proposed approach allows user to query network declaratively by an interface which helps collect and aggregate data. The service is also able to distribute and execute aggregation from nodes to a base-station in the network. This paper is a combination between a proposed SQL-like language and routing protocol in sensor network to perform aggregation task. The followings are contributions: - They propose a SQL-like language to both represent and process query in sensor streaming data - They apply this language in network and get the order of magnitude reduction in computation compared to centralized approach - Usage of this SQL-like language could provide transparency between end users and core system - They also show that this kind of high level SQL-like language could reduce the consequence of losing result in the network Discussion: 1. What happens if after a node receives its computational quota from the base-station but when it performs its task, its battery is run out of? We may need a solution to be aware of node’s battery and use a scheme to update node’s battery status over the network. In essence, the rest of battery could be managed by relying on the number of messages a node has sent and the amount of computation it has performed so far. This may help base-station select suitable nodes to distribute task and computation. 2. How about partial computation in this kind of sensor network? In general, aggregation includes computation of summary, maximization, counting, minimization, etc. If an immediate node waits for all lower level nodes’ returned data, it may be run of our buffer to hold this immediate result, especially in huge sensor network. Classifying task into categorizations and apply suitable techniques (like pipelining) could improve system performance. For example, for some query, a node could send the immediate results to the higher level node without waiting for later data from lower level nodes. However, we could not apply this technique for most of aggregations such as min, max, sum, count, median, etc. As a result, setting buffer size of immediate node raises a question. DIFS: A Distributed Index for Features in Sensor Networks In this paper, authors extend the data-centric storage architecture for range query support. As stated, the proposed distributed index (DIFS) provides for low average search and storage communication requirements. Moreover, DIFS could support both binary and range queries and save on communication overhead as only the high level events are communicated to avoid the flooding of query. They introduce high-level events and four query categories such as sensor value, timing parameter, spatial dimensions, and relationships between events. The most significant contribution of this paper is constructing distributed indexing of high-level events. This approach preserves load balancing, searches fast, and supports query on distribution of data. However, the tree architecture of DIFS may result in bottle neck at immediate nodes and root. Future work: - Collect data, implement and evaluate DIFS in a real system is a need to prove its efficiency - Deal with transaction loss - Improve the root of DIFS tree to avoid bottle neck and get load balancing as the root attends in every search From: Juan Jose Jaramillo Jimenez Sent: Tuesday, February 21, 2006 9:42 AM To: 'Indranil Gupta' Subject: 598ig review 02/21 1 DIFS: A Distributed Index for Features in Sensor Networks 1.1 Summary In this paper the authors consider searches over high-level events, rather than the traditional approach of standard queries that focus only on returning data statistics, where all data is gathered at a central storage and processor center external to the sensing environment. They say the traditional approach is not energy efficient, since it involves sending all measurements to a central repository, instead of doing aggregation at intermediate nodes. DIFS then tries to extend the data-centric storage architecture to allow for range queries, providing low average search and storage communication requirements. To achieve these goals, the authors propose the classification of events according to the following properties: sensor value, timing parameters, spatial dimensions, and relationship between events. DIFS builds on top of GHT (Geographic Hash Table), extending it to support efficient queries while maintaining balanced load across nodes. To achieve this, DIFS constructs a multiply rooted hierarchical index where leaf nodes can have multiple parents. The authors claim that DIFS performs better when there are many queries posed relative to the data generated, when the data is small, and when the events are sparsely distributed. 1.2 Comments and ideas for future work DIFS seems like an interesting idea, but the same authors claim that it works only for a specific setting, where many constraints are being assumed (see last paragraph of previous section). The question here is whether this approach can gain some relative success if it is focused on only one case, or if whether it is better to try to achieve a more general algorithm that does not perform so well than DIFS for this environment, but performs better in most scenarios. 2 TAG: A Tiny AGregation Service for Ad-Hoc Sensor Networks 2.1 Summary TAG is a service for data aggregation that allows users to express simple, declarative queries. The position of the authors is that since aggregation is fundamental to efficiency in sensor networks, it must be provided as a core service by the system software, consisting of a generic, easily invoked high-level programming abstraction. The motivation behind the project is that the users of sensor networks are often scientists that lack fluency in embedded software development but are interested in using these kinds of networks to further their own research. TAG then provides an interface for data collection and aggregation inspired by database query languages, specifically SQL: the user poses aggregation queries from a base station; sensors then route data back towards the user using a rooted tree algorithm, where the root is the base station. As data flows up, it is aggregated according to the aggregation function and value-based partitioned as specified in the query. TAG consists of two phases: (1) a distribution phase where aggregate queries are pushed down into the network, and (2) a collection phase where the aggregated values are continually routed up from the children to the parents. The authors claim that besides dramatically reducing the communication requirements, TAG has other benefits: tolerates disconnections and data loss, each node is required to send only one message per request, regardless of its depth in the routing tree, and since it divides the time in epochs it creates a natural way to put on standby the nodes, allowing for further power efficiency. Through simulations and analysis, the authors show that TAG can achieve an order of magnitude reduction in bandwidth consumption over approaches where data is aggregated and processed centrally. 2.2 Comments and ideas for future work In my concept, the most important contribution of this paper is to realize the fact that sensor networks will be used by people that are not software experts, but require the capabilities of these networks to do research, and therefore it is important to develop models for event gathering and recollection that are both efficient and easy to use. To do that, they consider that these services must be embedded in the OS of the nodes, making it an integral part of it. My biggest critique to the paper is that they appear to present the concept that TAG is the first approach to consider in-network data aggregation, and then compare TAG with approaches where data is aggregated and processed centrally. It would be more interesting to see the performance of TAG benchmarked against other similar approaches, like for example DIFS. From: mbakht@gmail.com on behalf of Mehedi Bakht Sent: Tuesday, February 21, 2006 9:28 AM To: Indranil Gupta Subject: CS598 IG REVIEW 02/21 Netid: mbakht2 Title: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Summary: The paper presents TAG – a service that does in-network processing of data based on declarative queries. The main motivation for in-network processing is to lower power consumption in sensor networks. According to the paper, transmitting a single bit of data in TinyOS motes is equivalent to executing 800 instructions. To take advantage of this energy tradeoff, TAG processes data inside the network rather than simply transmitting the sensor readings. To efficiently process this aggregation, TAG uses a simple, SQL-like declarative language for expressing aggregation queries and identifies and makes use of some properties of the key aggregation functions. A number of optimizations have been also proposed in the paper to further reduce the data demands on the system. TAG is not based on any specific routing technique. It only requires that the routing algorithm fulfill two conditions. One condition is to ensure that all nodes can receive each and every query request and the other is to ensure that there is at least one path from every node of the network to the root collecting the aggregated data. Time is divided into epochs in TAG. In each epoch, each node samples local sensors once and generates a partial state record that contains both local reading and readings from children. This partial state record is then passed to its parent. At the end of epoch, partial state record of the whole network is collected at root. The paper presents taxonomy of aggregate functions where they have been classified according to four properties. These classifications were later used to show how performance of TAG varies depending on the class to which the applied aggregation function belongs. Finally, the authors present the result of simulation of TAG that is followed by discussion of a prototype implementation that shows TAG performs significantly better than a centralized approach for data collection. Comments: There seems to be no concept of load balancing here. Nodes, which have more children and are placed higher up in the routing tree, will probably have to do much more computation. Title: DIFS: A Distributed Index for Features in Sensor Networks Summary: The paper presents DIFS – a distributed index for sensor networks particularly suited for range queries. It provides low average search cost and storage communication requirements while balancing the load across participating nodes. The working of DIFS is based largely on geographic hash tables (GHT). Direction of data propagation is dependent on the structure of the DIFS tree. DIFS trees have the special property that every node (except root) has bfact parents. Another important property is that the wider the spatial extent of a node in the DIFS tree, the more constrained the value range it covers. Experiment results at the end of the paper shows that DIFS is not always the obvious choice. DIFS performs best when there is many queries posed relative to events generated and the result set is not spread over most leaf-level index nodes. Structured replication is preferred over DIFS when many more events are stored than searches posed and most leaf-level mirrors point events that fit a query instead of a range search. Quad tree outperforms DIFS in terms of aggregate communication, but not in terms of bottleneck communication. Comments: It seems that there is very limited scope where DIFS can be efficiently applied. It would have been better if the authors could give an estimate about what percentage of database queries are basically range searches. Since its relative efficiency also depends on the ratio of events generated versus search posted, it is not suitable for a network that where query pattern can vary wildly. From: Mike Earnhart [mearnhart@gmail.com] Sent: Tuesday, February 21, 2006 8:45 AM To: Indranil Gupta Subject: 598ig review 02/21 Michael Earnhart CS598IG February 20, 2006 Summary TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks And DIFS: A Distributed Index for Features in Sensor Networks In A Tiny Aggregation Service for Ad-Hoc Sensor Networks the authors are attempting to generalize the aggregation of data in a sensor network. They claim that this is such a necessary part of sensor networks that it should be a integral part of all sensor networks. Their goal is to provide the tools for researchers who are not computer scientists to query the sensor network and retrieve results to be used by other programs and users. These queries provide aggregation whenever possible via in-network processing to minimize data transferred. This is because the energy it takes to transmit one bit equal nearly 800 times the energy to execute 1 instruction. Therefore it is imperative to minimize data transmission. The importance of this paper is primarily in the classification of aggregates. This definition provides an essential building block for all sensor networks that provide low power transportation. Another major contribution this paper made was the SQL like query syntax. This was a rather interesting development for sensor networks; to treat them like a database of information to be queried. SQL was also a well-defined language, which allows for rapid development and integration into programs. Additionally they provide a clear time division into epochs so that cpu idle time can be easily predicted and sleep mode can be enabled to save power. This is convenient for these systems because they are event driven it is often difficult to identify moments where no activity is expected. Finally the idea of a hypothesis combined with snooping adds another level of reduction in network activity. The problems that are very evident is that it complicates sensor networks significantly, and without their optimization it is clearly not enough to simply aggregate within the network. The implementation shows a significant problem in reliability due to a lack of optimizations. Hence it seems unlikely that this could ever be implemented without all of the complex optimizations and I find it less likely to be utilized because of the complexities. In DIFS they focus on improving data centric storage. Their improvements are such that one should be able to query for data in ranges and receive quick and energy efficient responses. There are four classifications presented similar to what TAG did with the aggregates. They define sensor value, timing parameters, spatial dimensions, and relationships between events. Many events fall under more than one category and this is because they are focusing on a high level abstraction to minimize data. Again they feel that processing the data in the network to provide the higher-level information for the query is the most effective way to reduce the energy cost. Also it is emphasized in this paper that searches and queries should be fast, flexible, and reliable. The strength of this paper is in their obvious discovery that the higher the level of abstraction the less data yet they claim that a high level abstraction can be achieved using sensor nodes. Their contribution of the method for defining a high level event within this energy constraint network is very important and is the absolutely necessary item for a system such as this to even function. One of the weaknesses is the lack of implementation, which seems so crucial to any wireless application because so little can be accurately simulated. In addition their higher-level abstraction may cause issues because of the need to specify such a large number of high-level events. This stems from the idea that these higher-level events are more specific and hence require more in number to retrieve all the data that is required. However if the data that is required is very specific clearly this will save on energy dramatically. From: Ravishankar Sathyam Sent: Tuesday, February 21, 2006 6:55 AM To: Indranil Gupta Subject: 598ig review 2/21 Ravi Sathyam A Framework for Time-Indexing in Sensor Networks Summary: This paper proposes a new way to store and retrieve time-indexed information in sensor networks. Normally, time-indexed information (such as temperature from 10 pm to 11 am on Feb 18th) is stored either in distant locations (incurring significant communications overhead) or is hashed to geographically close set of nodes (which has fault-tolerance problems if an environmental disaster can wipe them out). Hence, a unique solution would be the use of geographically dispersed Rendezvous Points (RP’s). A carefully designed schedule changes the RP’s from time-to-time. The focus on this paper is this particular utility-based RP election mechanism. Given that at a point in time, all the RP’s in the network form a dominating set in the graph, one can come up with a formula for the average power consumption of the network. If we assume that power consumed by an RP > power consumed by a non-RP, then along with some mathematical manipulation, one can conclude that in order to minimize the overall average power consumption, one must select a minimum dominating set with the least number of RP’s for each time slot. One can also show that given a utility function, one can come up with a non-linear integer programming problem with linear constraints to solve (the optimal solution for each NIP is also shown). The paper then presents the optimal solution to the RP election problem under the logarithmic utility function. The heuristic solution consists of a voting scheme that makes each node to be an RP for a finite time slot, and consists of three phases. In the Bidding phase, each node generates a RV bid and sends the bids to its neighbors. Upon reception of bids from all of the neighbors, a node can then decide which of its neighbors has the highest bid and sends this data to all neighbors, In the Election phase, nodes can determine from bids as to whether it should be an RP or not. A node becomes an RP if either it has the maximum bid amongst all its neighbors, or it is the maximum bidder amongst a set of its neighbors. In the Amendment phase, a node can serve as an RP if none of its neighbors do so. The paper then describes information retrieval from such a network. Since for each time slot the set of RP nodes form a dominating set, one can just query RP’s of that particular time-slot for the data for that particular time slot. Each RP for a time-slot keeps a list of RP’s in its neighbor and routes to reach them. In order to do so, each RP in a particular neighborhood of RP’s broadcasts overlay construction messages with TTL = 3 (since RP’s form a dominating set). The paper also provides a performance evaluation of the protocol. From this analysis, it seems that the optimized solution of RP election provides much less power consumption, overhead in building overlay and processing queries when compared to randomized election of RP’s. In addition, the network lifetime under the RP-sensing model is about twice as much as the all-sensing basic model (both having time-indexed storage). Comments: I did not understand the reasoning behind the Amendment phase. However, the most major accomplishment was doubling the network lifetime (which was a major intended problem to be addressed) TAG: a Tiny Aggregation Service for Ad-Hoc Networks Summary: TAG is an aggregation service for ad-hoc networks of sensor motes. It provides for two things – An interface for data collection and aggregation, as well as distribution and execution of aggregate queries intelligently in a time and power efficient manner. TAG provides for its own language (inspired by SQL) for users to express queries. In general, TAG queries have the following structure: SELECT aggregate expression of (attribute) from SENSORS WHERE (select your preferences) GROUP BY (attributes) HAVING (having certain preferences) EPOCH DURATION a certain unit of time Such TAG queries result in a stream of values. TAG consists of various types of aggregates, and consists of two phases. The first phase is the distribution phase, where aggregate queries are sent down the network. Query distribution occurs as follows: When a node receives a request from another node, it designates the sender node as the parent and sends information about the query, as well as the time interval in which it is expecting to hear back from, to its children. Here, children are determined by a broadcast to the network to determine nodes that have not heard the message in the previous round. In the collection phase, one must ensure that as few messages are used as possible. In order to accomplish this, parents wait until hearing back from their children before propagating a partial state record for the current epoch. From the paper, it seems that a parent choosing the time-interval for its children is like a decision-making problem that’s application specific. The two phases provide for functionalities such as GROUP BY and HAVING clauses, which are identical to the clauses in SQL. The paper goes on to explain the advantages of TAG - An ability to tolerate disconnections and data losses, and the fact that the data transmission is significantly much less as compared to centralized aggregation (since in most cases each node is required to transmit single message per epoch regardless of its depth), and also the ability to save more power by idling the processor and radio (by dividing time into epochs). The paper finally presents the implementation results and performance analysis of TAG. From the analysis, it seems that while aggregate functions like MAX, COUNT and AVERAGE have much better performance (in terms of bytes transmitted) than centralized aggregation, other functions such as MEDIAN and COUNT DISTINCT have similar performance as centralized (since parents have to forward all of children’s values to the root). Also, it is easy to see that in certain topologies (such as flat topologies where a lot of nodes are connected to the root), TAG does not perform so well as in hierarchial topologies. Also, an interesting case study was that of the effect of node errors on the values of the aggregate functions. Failure of nodes had little effect on aggregate functions such as AVERAGE as compared to the COUNT aggregate. Comments: Interesting idea to use SQL-style queries in sensor networks - a simple yet effective idea of representing aggregation of data while minimizing transmission of messages! From: ercanucan@gmail.com on behalf of Ercan Ucan Sent: Tuesday, February 21, 2006 3:52 AM To: Indranil Gupta Subject: "598ig review 02/21" TAG ------- Problem Addressed: In this paper, Tiny Aggregation (TAG) service for aggregation in Ad-Hoc Sensor networks is presented. It allows the users to express simple declarative queries and have them distributed and executed in a power-efficient manner in ad-hoc sensor networks. Moreover, this scheme is aimed to be sensitive to the resource constraints and lossy communication properties of wireless sensor networks. Contributions: Simple, SQL-like declarative language for expressing aggregation queries over streaming sensor data is proposed. Also, the key properties that affect the efficiency of this in-network processing are identified. Communication efficiency that comes with this approach as opposed to classical central approach is demonstrated. The abstraction described in the first contribution makes it possible to apply more optimizations transparently to the scheme. Useful end-to-end techniques for reducing the effects of network loss on aggregate results were the by-product of this research. Approach taken & Comments The tree based routing algorithm on which TAG would run should provide two capabilities. First, it must be able to deliver query requests to all nodes in the network. Second, it must be able to provide one or more routes from every node to the root of the network where aggregation data is being collected. Also, these routes must guarantee that at most one copy of every message arrives. TAG has two phases: Distribution phase and collection phase. In the distribution phase, aggregate queries are pushed down into the network. In the collection phase aggregate values are continually routed up from children to parents. They solve the problem of collection phase ensuring that parents in the routing tree wait until they have hear from their children before propagating an aggregate value for the current epoch as follows: They subdivide the epoch such that children are required to deliver their partial records during a parent-specified time interval. Also they select these intervals such that there is also enough time for the parent to combine partial state records and propagate its own record to its parent. For a significant portion of each epoch, motes are idle and can enter a low power state and this is quite beneficial in terms of power saving. The approach is also extended to support grouping. When a node receives an aggregate from a child, it checks the group ID of the child and based on that either aggregates the messages with other's children's(if in the same group) or forwards it along with its own message(if in some other group) to the parent. The in-network approach described in the paper can provide an significant order of magnitude reduction in bandwidth consumption over classical approaches. However, I think that the most important contribution of the paper is their approach to the problem in a systematical way such coming up with a taxonomy of aggregates. DIFS ------- Problem Addressed: In this paper, authors considered searches over semantically rich high-level events and presented the design, analysis and numerical simulations of a spatially distributed index that provides for efficient index construction and range searches. Also, the scheme discussed in the paper provides load balanced communication over index nodes. This communication basically uses the governing property that the wider the spatial extent known to an index node, the more constrained is the value range covered by that node. Approach Taken & Comments: DIFS is built on top of Geographic Hash Table (GHT). DIFS extends GHT in order to support efficient range queries. At the same time, it aims to maintain the load balance across the nodes of the network. Basically, DIFS achieved these aims by constructing a multiply rooted hierarchical index that differs from classical binary and quaternary trees in the property that non-root nodes can have multiple parents. Nodes store event information for a particular range of values detected within a particular geographic region. The main idea in using this structure is to incorporate the value of an event as well as the location of the detecting node in determining the storage node for that event occurrence. In this approach, the queries are again data-centric but the queries and their responses deal with higher-level abstractions. In order to generate these high-level events time series data is locally processed by statistical and pattern recognition engines. Then, these events are stored locally at the point that they are created and information about their attributes is inserted into indices. Queries are posed to these indices and the results are found in the indices themselves, storage nodes and even at the nodes that generate time series data. Analysis show that, DIFS has the best performance when there are many queries given to the system relative to the number of events generated, when each result set is small and when this result set is not likely to be distributed over the network in such a manner as to be discovered by most leaf level search nodes. Also, the authors discuss that the qual tree described in the paper works better than DIFS, however, it is not scalable to a large number of searches and/or stores. DIFS scales well to the large network structures by having a multiply root tree and a geography/value coverage trade-off that balances communication overhead over many nodes. -- Ercan Ucan - eucan2@uiuc.edu Graduate Student Computer Science Department University of Illinois at Urbana-Champaign ------------------------------------------------------------ From: Muyuan Wang [mwang2@uiuc.edu] Sent: Tuesday, February 21, 2006 12:57 AM To: Indranil Gupta Subject: 598ig review 02/21 TAG: A Tiny Aggregation service for ad-hoc sensor networks Summary: First the author argues that, in the emerging nowadays sensor network applications, the aggregation plays a very central role. Therefore, it must be better to provide the aggregation as a core service by the system, rather than by let the user modify the low level code to meet their real needs. So they proposed a generic aggregation service for ad hoc networks. It not only gives a simple interface for data collection and aggregation, but also has the ability to distribute and execute aggregation queries in the efficiently, i.e. the power usage is reduced, and the processing speed is much faster. In the proposed framework, first a SQL-like query language is introduced, with the output as a stream of values. Then the structure of aggregate queries is discussed. The aggregate query is implemented using three functions: a merging function, an initializer, and an evaluator. After that, the taxonomy of queries is also discussed. By utilizing the properties of these aggregates, we can perform the aggregate efficiently and incrementally. Four dimensions of properties are described in the paper, which are: duplicate sensitivity, exemplary and summary, monotonicity, and the amount of state required for each partial state record. In the simulation, the author shows that the aggregate queries can be distributed executed in the network efficiently, with reduced bandwidth consumption. Comments: The paper first put forward the idea of doing aggregates in network, instead of doing it in the centralized controller. The aggregation is then provided to the user as an important service, so that user need not bother to implement the mechanism themselves. This is really a great idea. The problem of this TAG approach is, although it can reduce the energy required to respond to queries, but it also utilizes the flood and response approach, which may causes a lot of overheads in communication. DIFS: A distributed index for features in sensor networks Summary: Similar to the TAG paper, this paper also considers searches over semantically high-level events. But the concentration of this paper is to design a distributed index, called DIFS, for indexing and search of the high-level semantic events generated previously in sensor networks. This approach is built upon the 'data centric storage' methods such as GHT. But unlike GHT, which can only report whether a certain event had occurred, it uses an extended data-centric storage architecture to efficiently support range queries. In other words, the query can be described using a range value. And the query can be either made by value or by space. In DIFS, a multiply rooted hierarchical index is constructed. The index is similar to the famous quad tree to some degree. They both build a spatially distributed index of histograms. However, differs from the traditional quad trees by letting non-root nodes to have multiple parents. In that way, the problems of using quad tree (the heavy traffic of the only one root node) are overcome. In DIFS, nodes store event information for a particular range of values detected within a particular geographic region. Higher-level nodes cover smaller value ranges detected within large geographic regions. DIFS uses a geographical hash within a hierarchically decomposed key space. The wider the spatial extent an index node knows about, the more constrained the value range it covers. As a result, DIFS can support efficient range queries while maintaining balanced load across nodes, while providing the search efficiency of a quad tree. Comments: This paper follows the TAG and GHT papers' work, trying to distribute the work in sensor networks from a centralized node, efficiently. The concentration of this work is on the storage part. The extended usage of quad tree seems elegant and convincible. However, it assumes the distribution is known a priori, which impairs the usability of this work. It would be better that a mechanism dealing with distributions that change during the system running can be proposed. From: Raghu Kiran Ganti Sent: Monday, February 20, 2006 6:39 PM To: Indranil Gupta Subject: 598ig review 02/21 Review for In Network Processing (Feb. 21 2006) ---------------------------------------------- Paper title: TAG: A tiny AGgregation Service for Ad-Hoc Sensor Networks Summary: -------- This paper introduces a SQL type query language for data collection in sensor networks. The authors present an aggregation service for a large scale distributed wireless sensor network. The high level query language abstracts the details of the issues with respect to routing and network losses from the end user. The authors also provide a classification of the aggregates and provide optimizations through in network aggregation. Main ideas: ----------- The main ideas of the paper are a SQL type declarative language for query processing and in network aggregation for providing network optimizations (in terms of number of messages transmitted). Simple SQL like interface is provided with minor modifications to the end user. One of the modification is the introduction of EPOCH keyword, which is used to indicate the duration of wait time before transmitting the next sample. The authors classify the aggregates into four categories- duplicate sensitivity, exemplary/summary, monotonic, and amount of state required for partial state record. A scheme for data propagation in the network is also presented, where a tree is used for such purposes. Also, in a later part of the paper, the authors present solutions to the routing in case of failures. Some optimizations are provided to improve the performance of TAG, these include methods such as snooping on network traffic to initiate aggregation in case of failures, hypothesis testing to reduce the number of messages sent over the network. Strengths: ---------- The main strength of the paper lies in the fact that the end user's life is made much easier, he/she can query the sensor network without knowing anything about TinyOS or details of how to program it! This is a very useful tool for scientists such as biologists and environmentalist, who would like to do more ``useful'' work without worrying about irrelevant details. Use of SQL type declarative language is a good choice as quite a few people know how to use SQL (especially scientists who handle databases). In my opinion, classifying the data into different aggregate categories is a good thing, as it was one important factor to come up with optimizations. The authors conducted extensive simulations and provided data from real deployments, which substantiates the paper and shows that TAG is a realistic algorithm. Weaknesses: ----------- In my opinion, the main weakness of the paper is that TAG is targeted to only a certain number of applications, namely data collection (with certain amount of data processing). For example, if we wanted to write a tracking style application, then TAG is not at all useful. Also note that most of the environmentalists would like to collect the raw sensory data (for scientific applications) and do any processing on a PC. It is not clear whether TAG is optimized for raw data collection applications. ****************************************************************** Paper title: DIFS: A Distributed Index for Features in Sensor Networks Summary: -------- This paper introduces another in network aggregation scheme, but different from TAG, as it considers high level queries. For example, ``Is a rabbit present in area A?''. In the authors' words, they do not consider searches over time series data, rather they look at searches over semantically rich high-level events. A spatially distributed index that provides efficient index construction and range searches is presented in the paper. Main ideas: ----------- The main idea of DIFS is that raw sensory data is generated, which is processed using various statistical and pattern recognition techniques, and the results are inserted into indices, which the user can query and find an answer to his/her query. The paper presents a classification (similar to the one in TAG) for higher level events. The four main categories in the classification are based on: sensor values (time series data and statistics), timing parameters (sequence and duration of events), spatial dimensions (physical shape and size of an event), and relationships between events (relations between events over space and time). DIFS builds on GHT (Geographic Hash Table), it introduces the notion of having multiple parents for a given node. This helps in providing efficient load balancing across nodes. The higher level nodes cover smaller value ranges detected within large geographic regions while lower level nodes cover a wider range of values from within a smaller geographic region. This enables DIFS to achieve load balancing and efficiently support range queries. Strengths: ---------- The main strength of the paper lies in the fact that any application can use DIFS, as they just need to store the event values (indexing) on the nodes. The sensor network can be queried and the results can be retrieved efficiently, because of the indexing provided by DIFS. It supports higher level queries such as queries for events, which TAG does not support. DIFS is efficient as load balancing is explicitly taken care of and the user can query for an event without worrying about writing code to do so! Weaknesses/Future Work: ---------------------- There is no implementation of the DIFS algorithm on realistic nodes, neither is there any proper simulation to show that in reality this algorithm works. Given the fact that the sensor nodes have very little memory, processing power, is it really feasible to do complicated pattern recognition and signal processing in the network and also store the data collected and respond to queries? A realistic implementation will prove this point! Also, the authors do not consider the losses in network, which might lead to frequent bursts of disconnections between nodes, thus making DIFS non-adaptive to such cases. From: Praveen Jayachandran Sent: Monday, February 20, 2006 5:15 PM To: Indranil Gupta Subject: cs598ig review 02/21 CS598IG Review 02/21/2006 Praveen Jayachandran DIFS: A Distributed Index for Features in Sensor Networks --------------------------------------------------------- This paper presents the design and analysis of a spatially distributed index that provides for efficient index construction and range searches. This is in contrast to prior work that focus on queries that use in-network aggregation of time series data. The basic data-centric storage architecture is extended to support range queries. New data structures to support high-level data abstractions are proposed. The authors propose a classification of the properties and relationships of high-level events into sensor values, timing parameters, spatial dimensions, and relationships between events. Events are stored locally where they are created, and information about their attributes are inserted into a distributed index. This distributed index is a multiply rooted hierarchical index, where non-root nodes can have multiple parents. Nodes store event information for a particular range of values within a particular geographic region. Higher-level nodes cover smaller range values, but a larger geographical area. Algorithms for inserting, querying for, and removing high level events are discussed. DIFS is compared with two existing distributed indexing schemes for sensor networks, namely Geographic Hash Table (GHT) and a quad tree approach. Their simulation studies indicate that DIFS performs best when the search queries are sufficiently constrained, and the number of queries posed is sufficiently large compared to the number of events generated. Comments: 1. As identified in the simulation studies, DIFS performs well only when the queries are sufficiently constrained. This could prove to be a serious drawback as deployment statistics of TAG suggest that most queries are not constrained, but rather general in nature. However, most of these deployments were research deployments and there could be real applications where DIFS could be useful. 2. The storage and maintenance overhead of DIFS seems to be quite high. 3. The performance of DIFS for binary queries is not investigated. 4. There are several inherent assumptions made in the design of DIFS as identified in the future work section. For example, the distribution of values stored in histograms is implicitly assumed to be uniform, and often the distribution of values will not be known apriori, leading to load imbalance. 5. DIFS assumes a uniform distribution of sensors in the field. TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks ---------------------------------------------------------- TAG proposes an in-network aggregation service for sensor networks. Users can express simple, declarative queries, which are distributed and executed in a time and power efficient manner within the sensor network. The main contribution of TAG is its taxonomy of aggregate functions based on their properties such as duplicate sensitivity, exemplary or summary, monotonic, and size of partial state partial state. For each of these properties the authors discuss the extent to which aggregation is possible and how aggregation is to be performed. This taxonomy is used to support several optimizations and cost reductions. Optimizations to the basic scheme such as snooping, caching, probabilistic forwarding, and using multiple parents instead of a single parent are proposed to improve reliability of search results even in the presence of failures and to reduce communication cost. SQL is used as the declarative query interface. Through performance studies the authors demonstrate the advantages of TAG over traditional centralized methods. Comments: 1. TAG can be extremely fragile to failures. If a node that is high up in the tree fails, then all the data in the sub-tree rooted at that node would be lost adversely affecting aggregate results. Although some optimizations such as caching are proposed to counter this problem, they do not alleviate the problem entirely. Also, the errors would be larger for some queries than for others, making it difficult to detect. 2. The classification includes a class called content-sensitive aggregates, which is like saying 'everything else'. The authors abandon this class as no aggregation is possible. Are there alternate mechanisms by which approximate solutions can be obtained through aggregation, yielding considerable energy savings for little reduction in accuracy of reported results?