From: Jin Liang [jinliang@cs.uiuc.edu] Sent: Thursday, September 16, 2004 11:08 AM To: Indranil Gupta Subject: 598ig review 09/16 Gossip-based Ad Hoc Routing Routing algorithms in ad-hoc networks have more or less used flooding for route discovery, due to the fact that node mobility often makes it costly to maintain a consistent routing table. The goal of flooding is for the route request packet to reach every node in the system. However, flooding often sends more packets than necessary. This paper proposes to use gossip-style packet forwarding for route discovery, based on the theoretic results that gossips often exhibit bimodal property, which means either they die out pretty soon, or they reach all or most nodes with high probability. Using mainly simulation, this paper examines the performance advantage of gossip-style forwarding and proposes several heuristic techniques to further improve the algorithm. They found that in large networks, gossiping saves 25% to 35% packets, while the performance is comparable to flooding. Their heuristic techniques, such as adaptive forwarding probability, re-broadcasting, selective acknowledgement, etc., can further decrease the probability that a gossip dies out too soon, and prevents the source node from indefinite waiting. The main contribution of the paper is to apply the theoretical results of bimodal property of gossip to routing in ad hoc networks, and show by simulation that gossip style forwarding (and their heuristics) can indeed lead to less packet forwarding while achieving the same or even better performance. However, the paper did not address the problem of how to select the appropriate gossiping parameters, such as the forwarding probability. This potentially depends on the network topology. The paper only showed there exists a cutoff probability, but it is not clear in real networks, how can the nodes find the appropriate probability (the right probability may be different from those found in the paper, because the network topology may be entirely different) Epidemic algorithms for replicated databases. This paper presents several interesting algorithms for maintaining consistency among replicated databases. Databases are often replicated in order to provide high availability. However, since multiple sites can be modified concurrently, there must be an algorithm to maintain consistency among different sites, and to abort transactions if they are conflicting with each other. Previous work either use global synchrony, which is heavy weight, or considers modification of single object. This paper considers transactions that may modify multiple objects at the same time. Each site maintains a matrix clock to infer whether a transaction has been received by other sites. If every site has received the transaction but none has aborted the transaction, then it is committed. By using epidemic log propagation, efficiency of the system is improved while consistence is guaranteed. In addition to this pessimistic algorithm, the paper proposes several optimizations to improve the optimism. Write locks are released as soon as possible, and conflicts are detected later, possibly aborting some transactions that have not been committed. Bimodal Multicast This paper focuses on reliable multicast protocols. Previous work either uses virtual synchrony between group members, which has poor scalability, especially when a few members are slowed down, or uses packet retransmission for scalable reliable multicast, which has large absolute probability of failure detection when the network size is too large. Both work neglects stable throughput as a reliability metrics. This paper presents a bimodal multicast protocol that has good scalability, and predictable reliability even under highly perturbed conditions. The protocol has a number of good properties such as atomicity, throughput stability, message ordering and loss detection. The performance of the protocol is verified by comparing it with SRM under different network conditions. The main contribution of the paper is to present a different view on multicast reliability, and proposes the gossip style, or probabilistic multicast to solve the scalability and throughput stability problem even when the network is under high perturbation. From: Yookyung Jo Sent: Thursday, September 16, 2004 10:55 AM To: Indranil Gupta Subject: 598ig review 09/16 Gossip-Based Ad Hoc Routing Ad Hoc routings are generally based on some form of flooding. This paper applies gossip protocol instead. The key point is that Gossipping exhibits bimodal behavior in large-enough network (The bimodal behavior is theoretically proven for infinite network). With small probability, the gossip reaches very small portion of nodes and quickly dies out. With large probability, the gossip reaches out almost all the nodes. The probability of reaching intermediate number of nodes is vanishingly small. By tuning to a little above the threshold value p of a node gossipping to each of its neighbor, a message originated by a sender can reach almost all the nodes, but with substantially reduced traffic. In order to apply gossipping to ad hoc routing optimally, they introduced various heuristic optimizations such as the followings. Adjusting gossipping probability among two values, so that a node with few neighbors in ad-hoc network can have more probability to gossip, detecting the clue of a message dying out by checking whether the message is sent from the neighbors and if not broadcasting it, using retries and combining the protocol with zone approach. Epidemic algorithms for replicated database maintenance This paper introduces epidemic algorithms to replicated database maintenance. The existing approach of direct emailing (a node with new update directly emails to all other databases) is not reliable nor scalable. They introduce anti-entropy (every node chooses another node at random and resolves the difference by comparing the content of database) and rumor mongering(a node with update periodically chooses another node and notify the change in recursive manner, but if a node trying to notify the new updates found out that most of the responder already has the update, then it stops propagating the update). They use rumor-mongering as the main means of maintenance, and anti-entropy as the back-up, because rumor-mongering is more efficient but it does not guarantee the eventual consistency as anti-entropy does. They investigate various parameters of the algorithms, such as the relative advantage of push vs pull (push means that when a node with updates contact another node, it injects an update if the node has more recent copy, pull means the initiating node receives an update if the contacted node has more recent copy) in various setting, variation in heuristics (blind vs feedback, counter vs coin), how to manage data deletion (the trade-off between storage overhead of maintaining deletion history vs. the danger of deletion resurrected), how to adjust the algorithm to a realistic non-homogeneous spatial distribution of the network. From: Adeep Singh Cheema Sent: Thursday, September 16, 2004 10:54 AM To: Indranil Gupta Subject: 598ig review 09/16 09.16.2004 Cheema, Adeep S Epidemics cheema@uiuc.edu Gossip-Based Ad Hoc Routing Gossiping is probabilistic routing to reduce unnecessary propagation of messages thereby reducing network overhead. Very applicable to wireless ad hoc networks which suffer from low bandwidth, diverse channel conditions and limited power. Propagation is all or nothing – either message dies out very quickly or reaches most targets – bimodal behavior. Gossiping 35% drop-off in message overhead than flooding although performance varies greatly in p – the probability that a node forwards a message and also in k – the number of hops for which a message is sent to all neighbors. Degree of the graph (level of connectedness) has implications on the required value of ‘p’ for successful delivery. There is also interesting behavior at boundaries, since nodes there have fewer neighbors and fewer messages actually propagate to them from nodes beyond them since they are at the boundary. Several heuristics to improve gossiping: Setting ‘p’ to be a function of node degree works better than GOSSIP(p,k) demonstrating greater penetration with fewer messages and is very realistic since nodes must be aware of their degree. Using feedback from neighbors to assess how well the message is being forwarded. Retrying the protocol multiple times if success is not met instantly (due to bimodal nature). Using zones to reach distant nodes with greater probability although it requires more and current information to be stored at each node. In general, gossiping has a performance improvement over flooding with the trade-off that a low percentage of nodes will not receive the broadcast. Gossiping also seems to scale well with network size. Criticism: Gossiping cannot guarantee full penetration in a synchronous system without failures unless p=1 where it resorts to flooding. Epidemic Algorithms for Replicated Database Maintenance The goal is to develop scaleable protocols for distributed updates on a replicated database with claims of performance and consistency. A probabilistic, epidemic based approach is used to satisfy somewhat relaxed consistency requirements with an efficient update rate. Update time and network traffic generated are important performance metrics. Three means of replication discussed are: 1.Direct Mail: Nearly reliable but places a large load on the system – O(n^2) messages. 2.Anti Entropy: Pairs of systems randomly communicate and resolve differences. Fairly reliable but updates are expensive and convergence is slow. Mailing of data may still be required. 3.Rumor Mongering: ‘Infected’ nodes believe that they have an update of interest which they propagate till they are satisfied that the update has been reasonably distributed. Rapid updates with low traffic and several available optimizations. Deletion is propagated like updates using ‘Death Certificates’. Network topology can play a role in determining rate / extent of replication for epidemic based protocols. Experimentally, Rumor Mongering was able to perform and scale reasonably well. Bimodal Multicast Bimodal Multicast is an attempt to at a protocol with high guarantee of reliability while maintaining high scalability and stability of throughput. The protocol is also resilient against network turbulence and noise. Scalability is attained since, due to the nature of the algorithm, operations are independent of scale. There is deterioration of the penetration of the multicast with distance from the source however, but the only guarantee given by the protocol is that the message will reach many of the nodes though it may not reach all as long as it is a success i.e. the better of the two (bimodal) outcomes is attained. The pbcast (probabilistic multicast) protocol is discussed which comprises of a best-effort hierarchical broadcast (optimistic dissemination) and a two-phase anti-entropy (gossiping) protocol. The protocol satisfies properties of atomicity (from bimodal nature), throughput stability, ordering, multicast stability, lost message detection and scalability. The first stage of the protocol involves an unreliable multicast of each message as a quick way to deliver the message to many members. The second stage of the algorithm is the anti-entropy protocol, which aims to deliver the message to the rest of the nodes. As in the case of gossip based routing, there are several heuristics available which can enhance the speed, reliability of the basic protocol, with small or no overheads. Practical experiments show good reliability and scalability. From: Vartika Bhandari Sent: Thursday, September 16, 2004 10:40 AM To: Indranil Gupta Subject: 598ig review 09/16 Epidemic Algorithms for Replicated Database Maintenance This paper describes an epidemics-based approach to maintenance/updation of replicated databases. The context is that of the Clearinghouse servers of the Xerox Corporate Internet. The authors delve into 3 classes of solutions and analyse them. These are: 1. Direct-mail (each update mailed to all other sites immediately) - It leads to quick dissemination but does not scale well due to the potentially large trafic volumes generated. 2. Anti-entropy (sites regularly choose random partners with whom to resolve differences) - Although reliable, this approach involves complete database comparison. The comparison issue may be somewhat resolved by using hash signatures. Only if the signatures fail to match, is a full comparison required. Further improvement is possible via incremental signatures. 3. Rumor-mongering - This is the focal point of the paper. When a site first becomes aware of an update, it periodically communicates this information with another randomly chosen site. Over a period of time, as it starts encountering sites that already know the update, it loses interest in disseminating the information. The process is less costly and can be done more frequently. However some updates might never reach some sites with a non-zero probability. The authors analyze the rumor-mongering approach in greater detail. 3 modes vix. pull, push and push-pull are analyzed. Pull is deemed to be better than push (given uniform connectivity) via a mathematical argument on fast/slow convergence of susceptible sites to 0. This is intuitive, as once the update has reached are large number of sites (say a fraction 0.8), when susceptible site A chooses a partner, it is very likely an infective site, and so A has a good chance of "pulling" in the update. On the contrary, when infective site B chooses a partner, that partner will be a susceptible site with low probability and chances of "pushing" the update are lower. Handling of deletions via death-certificates is explained. In particular, the approach of using dormant death-certificates to reduce on storage is of interest. There is some very interesting discussion on how spatial distributions affect an epidemic approach. Since exchange of updates with nearby sites is less expensive, one might wish to do so preferentially. This can lead to increase in convergence times, but it is shown that if one exchanges updates with sites at distance d with prob proportional to d^-2, this yields good traffic and fair convergence characteristics. However this discussion looks at convergence latency in terms of rounds. If one considers individual link delays and rounds are not synchronized then a message sent to a near node in one round takes less time than one sent to a farther node in one round. In that case the delays should rise for messages sent to farther nodes, thereby making preferential updates with near nodes a clear choice. The paper provides excellent discussion of epidemics related approaches and the issues involved. However the spatial distribution argument could have been better exemplified with further examples and figures. Gossip Based Ad-hoc routing This paper proposes the use of gossiping rather than flooding in on-demand routing protocols for ad-hoc networks. The aim is to reduce routing overhead and thereby alleviate congestion/contention. The single-most salient feature of the paper is the discussion on the "Bimodal behavior" of gossiping. Bimodality implies that either a gossip reaches a very large fraction of nodes, else it reaches extremely few. This property ensures that either a message is well-disseminated, else it barely propagates. In the latter case, the cost of the failed attempt is very low, and a retry may be made with increased gossip probability if needed. Some heuristics to reduce chances of the gossip dying out quickly are also discussed. These include deterministic gossiping for a few hops etc. The authors modify AODV to use gossiping instead of flooding and show that the routing overhead reduces significantly. An interesting point to note is that the authors explicitly state that their notion of bimodal is different from the "Bimodal multicast" paper. However, the two papers seem to be using "bimodal" in exactly the same sense! It is debatable as to how gossiping would work in non-uniform networks (ref. the spatial distribution discussion in the epidemics paper). Of course, the situation is remediable by initiating a retry with higher gossip probability (at the cost of greater latency). Bimodal Multicast This paper describes a multicast protocol that provides a bimodal guarantee viz. that a multicast message either reaches most processes, else very few. The exact probabilities may be adjusted via tuning system parameters. The protocol shows stable throughput characteristics and maintains a FIFO ordering among messages. The protocol comprises two stages. The first involves an unreliable multicast primitive which is used for initial dissemination. Thereafter an anti-entropy protocol is employed to resolve discrepancies. The protocol first exchanges updates regarding recent messages to ensure that some processes do not permanently lag behind after recovering from a transient failure. Of course, this raises the issue of how FIFO order will be respected if recent messages get delivered first at the recovered process. Another issue is that of healthy processes retaining enough state for enough time for updates to be disseminated to recovered processes. The proposed approach is useful in that it represents an effort to bridge the gap between unreliable and totally reliable multicast primitives by providing probabilistic guarantees. From: Al Harris [aharris@cs.uiuc.edu] Sent: Thursday, September 16, 2004 10:23 AM To: Indranil Gupta Cc: aharris@cs.uiuc.edu Subject: 598ig review 09/16 Epidemic Algorithms for Replicated Database Maintenance Main Idea: For a replicated database, there are two main factors that must be considered when analyzing the update algorithm: the time required to propagate an update to all sites and the amount of network traffic generated by an update. The paper analyzes three methods of updating: direct mail, anti-entropy, and rumor mongering, the last two being epidemic algorithms. One concern with epidemic algorithms is the amount of network traffic. Uniformly distributing neighbors that are to be infected causes rapid spread of the updates, but generates a lot of network traffic, therefore alternative distributions need to be found. The direct mail method seems to have two main problems, first it creates a lot of network traffic, proportional to the number of replication sites times the distance between these sites. Second, it can fail as the network fails, the basic method requires some kind of reliable transport for the update messages to keep the system in sync. Anti-entropy can be used to correct errors by the direct mail system. When used like this, it is important to design the algorithm correctly (push, pull or push-pull). There are a number of ways to do rumor mongering. The basic idea is that when one machine gets an update, it begins to share that update to random neighbors until it becomes disinterested, either because it shares the update with a machine that already has that update, or via some other rule for become removed. With rumor mongering, there is still a probability that the system will come to a stable state without all machines getting the update, one solution to this problem is to back this up with infrequent anti-entropy. One other issue to deal with is the deletion of items. To prevent old copies from being written to servers with deleted copies by epidemic algorithms, some sort of "death certificate" is needed. Gossip-Based Ad Hoc Routing Main Idea: Ad Hoc networks are energy constrained. A common way to attain route information is via flooding, however, this cause unnecessary traffic, which is expensive in terms of energy. Therefore, it is beneficial to apply epidemic algorithms to attempt to perform route discovery without flooding. However, most previous gossip algorithms assumed that each node had a route to every other node. It was then a matter of picking some subset of nodes to gossip to. In the ad hoc routing case, the algorithm is attempting to find routes. Basically, nodes will forward route requests to neighbors with some probability p. There are two basic gossip schemes discussed along with some schemes to use other protocols along side them. In general, the authors show that messages can be saved by using gossip messages. Such a protocol essential helps alleviate the flooding costs of routing protocols. ZRP (which the authors talk about) has the same goal. Of course, if the network topology/mobility is such that flooding is not needed very often, then there will be few savings. As a proof of concept, the authors add their gossip mechanism to AODV to show that the number of messages needed for route discovery is reduced without significant increases in latency, etc. Bimodal Multicast Main Idea: This paper applies the same ideas to multicast. The idea is that instead of using a standard "reliable multicast" idea, an epidemic algorithm can be used to yield some very high probability that all the intended receivers will get the information. This can be achieved while maintaining a stable throughput. The paper then presents a number of optimizations. Basically, this is another way epidemic algorithms can be used. Really the key is that any time we have a need to have virtually complete distribution of information with minimal bandwidth usage. This is the common thread. In order to be able to use epidemic algorithms, a probablistic reliability scheme must be sufficient. From: Jintae Kim [jkim_20@hotmail.com] Sent: Thursday, September 16, 2004 9:43 AM To: Indranil Gupta Subject: 598ig review 09/16 CS598IG Review 9/16 Jintae Kim (kim28@uiuc.edu) Epidemic algorithms for replicated database maintenance, A. Demers et al, PODC 1987. This paper introduces three algorithms to maintain the replicated databases in a distributed setting: direct mail, anti-entropy, and rumor mongering. The basic setting is as follows. There are databases replicated at thousands of sites. Updates are injected at a single site, and they should be propagated to all other sites over unreliable networks until contents of all replicas will be identical. The idea of direct mail is that a single site mails the update to every other site. This method is efficient, but not entirely reliable, which leads the necessity of epidemic algorithms. In anti-entropy method, sites pick random partner and exchange database content and resolve differences. In rumor mongering, nodes spread hot rumors to other random nodes until the rumor becomes cold. Since not all nodes may have heard rumor by the time it becomes cold while updates are spread rapidly, this method can be combined with anti-entropy to achieve eventual consistency. These algorithms also require spatial distribution to spread information more efficiently. Since network itself is not uniform and some critical links face larger traffic, the algorithm should incorporate the network topology. Specifically, it turns out that rumor mongering is sensitive to spatial parameters. This paper is one of the earliest works in computer science that adopts epidemic algorithms. It successfully presents the properties of anti-entropy and rumor mongering, and shows that it can be applied to efficient maintenance of distributed databases, which goes well along with current peer-to-peer systems. The techniques discussed in this paper are rather fundamental, and there are more issues to be resolved, such as scalability, and more methods for further enhancement. But despite all of those, this paper is a very interesting and meaningful seminal work. Gossip-based ad hoc routing, Z. Haas et al, Infocom 2002 This paper presents different variants of epidemic algorithms to improve the performance of ad hoc routing. The motivation of this paper comes from the fact that there are many unnecessary messages propagating over the networks in the flooding-based routing. Authors make some optimization on the original epidemic algorithm to reduce the possibility that none of them ends up with gossiping at all. The rest of this paper is to set the parameters, observe the results in different graphs of networks, and find methods to improve the performance. The experimental results show that the gossip probability can be as low as 0.65 over various grids or random graphs, which means up to 35% traffic can be reduced compared to flooding. Other optimization techniques, such as two-threshold scheme (gossip probability dependent on the degree), preventing premature gossip death, retries, zones, are used to improve the performance of gossiping. They also incorporated this method with AODV. While it shows performance improvements even in small networks of 150 nodes in ADOV, the paper claims that it should still be useful in large networks. This paper shows how the epidemic algorithm can be applied to ad hoc routing. Although this idea is not the first to use gossiping, the result looks promising in terms of performance improvement. It would have been much better if they could give the experimental results in large networks, especially considering congestion issues. Bimodal multicast, K Birman et al, ACM TOCS 1999 There had been two types of reliable multicast. One class supports strong reliability at an expense of limited scalability. The performance under congestion becomes unpredictable and the throughput is degraded under transient failures. On the other hand, there are best-effort reliable solutions. It makes a reasonable effort to overcome any failure from a participating process. It gives better scalability than the first class, but has unpredictable message guarantees when the system is highly faulty. Bimodal Multicast claims to be a solution to both types, in a sense that it is both scalable and has predictable reliability. This algorithm first uses an unreliable, hierarchical broadcast to make a best-effort attempt at delivering messages. Then the algorithm uses a second protocol that attempts to fix any mistakes made from the broadcast. To do this, gossip messages are exchanged to find out any missed messages by any processes. If so, they will send responses to the gossip messages to ask for the lost messages. There are also several optimization techniques to reduce unnecessary communication. It helps limit its cost under failure scenarios. The system shows, through the experiments, scalability and survivability under stress, as the authors had claimed. Also the formal models support such strengths of this protocol. I think this paper is also a meaningful work that successfully employs epidemic algorithms to another area: multicast. It incorporates both scalability and reliability in multicast very well. The stability of protocol is excellent even under heavy perturbation with low overhead, and the throughput is very stable. However, when there are a large number of nodes, the latency under a heavy perturbation remains to be seen, considering the random selection of the targets when gossip messages are sent. From: Jungmin So [jungmin.so@gmail.com] Sent: Thursday, September 16, 2004 9:39 AM To: Indranil Gupta Subject: 598ig review 09/16 Paper Review - Sept. 16 - Jungmin So (jso1@uiuc.edu) All three papers reviewed here uses epidemic algorithms for different applications. Epidemic algorithms rely on the result from literature of Epidemiology which says that an epidemic disease spreads out very quickly even with small number of contacts between pairs. This fact is applied to the computer science problems where an information needs to be spread out from one source to a large number of other members in a group. The epidemic algorithms are reliable and scalable, which nicely meets the need of large scale broadcast or consistency maintenance. Demers et al. applies epidemics for replicated database maintenance [1]. The goal is to maintain consistency among databases that are replicated in large number of distributed nodes. The paper mentions three possible approaches to achieveing this: direct mail, anti-entropy, and rumor mongering. The direct mail approach, where a source node sends a message to all other nodes whenever there is an update, is very expensive. Also, if a node does not have knowledge on all other nodes, this may not be reliable as well. The anti-entropy and rumor mongering methods are epidemic approaches. In the anti-entropy algorithm, each node periodically chooses another site randomly and exchange database contents so that the difference is recovered. It is very reliable according to the results from epidemiology, but to examine the database contents in every contact is costly. The last approach is rumor mongering, in which updates spread out as a "hot rumor". So whenever a node holds a hot rumor, it periodically chooses another nodes and sends the update to ensure that the receiver has heard the hot rumor. Now this periodic message exchange on a hot rumor must be stopped when every node has heard the rumor. So there is a mechanism for making senders lose interest, so that the spread of rumor is stopped at some point of time. Rumor mongering approaches is efficient in terms of cost, but because of the randomized aspect of the algorithm, it is possible that some nodes do not hear about the update. So the protocol suggested in this paper uses rumor mongering but also uses anti-entropy at a lower frequency to increase reliability. This paper also considers deletion of an information. Because of the anti-entropy methods, information tends to be recovered even if it needs to be deleted from all databases. For this reason, a death certificate is used to resist this and completely delete the unwanted data. There is a lot of optimizations that further reduces cost and improves reliability. One is to exploit the topology. It is beneficial to choose contact nodes according to the spatial distribution, instead of a uniform fashion, because the average number of hops can be reduced. In [2], the epidemic algorithm is applied to ad hoc routing. The main goal is to reduce the cost of broadcast which is mainly used for on-demand route discovery. The algorithm is simple: when a node receives a broadcast message, it forwards the message with a probability p. Now the epidemic algorithms have a bimodal behavior. If p is larger than a threshold value, then almost all nodes gets the broadcast message with high probability. If p is smaller than the threshold value, then only a small fraction of nodes receives the broadcast message. The threshold value depends on the underlying topology. So with p set to a reasonable value, the algorithm achieves both scalability and reliability in broadcasting messages. There are optimizations added to this algorithm. The first one is to enforce initial propagations before gossiping. This is because the source node may happen to be a sparse area, which leads to a quick die out of gossip message. The second one is to have different probability according to the number of neighbors. The third one is to consider the number of neighbors that received a broadcast message. So even if a node chooses not to forward a packet, after listening to its neighbors forwarding the packet, it may change its mind and forward the broadcast message. [3] uses epidemic algorithms for multicast. Here, an anti-entropy mechanism in which members choose other members to correct inconsistencies is used is used with unreliable IP multicast to improve reliability at a low additional cost. It is effective for applications that requires throughput stability, as well as atomicity because the algorithm has a bimodal characteristics. The epidemic algorithms used for propagating information achieve high reliability at a reduce cost compared to a deterministic algorithms. This is achieved at the cost of latency, and "chances" that some nodes may not receive the information. These algorithms can be applied to applications that requires scalability and have some tolerance to some latency in information propagation. [1] A. Demers et al, "Epidemic Algorithms for Replicated Database Maintenance", PODC 1987. [2] Z. Haas et al, "Gossip-based Ad Hoc Routing", Infocom 2002. [3] K. Birman et al, "Bimodal Multicast", ACM TOCS 1999 From: Michael Treaster [treaster@cs.uiuc.edu] Sent: Thursday, September 16, 2004 8:29 AM To: Indranil Gupta Subject: 598ig review 09/16 Epidemic Algorithms Epidemic Algorithms for Replicated Multicast Database Maintenance The first paper of the three in chronological order relates specifically to synchronizing database replicas on separate machines. This work was published in 1987 by researchers at Xerox. It discusses three possible approaches to propagating changes to one replica, describes problems with these approaches, and addresses them. Of particular interest was the use of death certificates with their two stages of expiration. The authors describe a problem with the dormant death certificates where, if the system scales to too large a size, catastrophic failure will be the result. One possible solution would be to propagate the operations on a database rather than propagating the contents of the database. This removes the difficulty where an item deleted on one replica is replaced by a replica propagating its contents. The algorithms described in this paper might be adapted to share changes in a peer-to-peer distributed database system. The contents of the database would be replicated on multiple machines, and all replicas containing an entry would need to be updated when the entry was changed. This problem might be similar to that considered by the authors. One would need to consider scalability issues, however, since some of the authors' techniques do not scale. Bimodal Multicast The second paper chronologically describes a multicast protocol with quantifiable reliability in terms of multicast throughput stability. The origins of this work can be traced back to the first paper. It identifies the bimodal characteristic of gossip algorithms, showing how such algorithms terminate with either very few nodes receiving the message, or with almost all of the nodes receiving the message, but only rarely does the outcome fall in between these two extremes. The algorithm in this paper has two basic parts. First, the message is multicast along a randomly selected spanning tree of the nodes in the group. The authors assume that membership of all nodes in the group is known. This may be true for some problems and not true in others, but either way this design decision limits the scalability of the solution. Since the communications in phase 1 are prone to error, phase 2 is intended to detect and correct inconsistencies in the network. This is accomplished using an anti-entropy algorithm based on that from the first paper. This is the most interesting part of the paper, since the authors have made a variety of improvements over the algorithm from what was done at Xerox. The improvements are intended to suppress fluctuating communication patterns in the network. The authors provide experimental evidence to demonstrate that their improvements to the algorithm result in a noticeable improvement in performance, and in particular reliability. Gossip-based Ad Hoc Routing The bimodal nature of gossip algorithms is also discussed by the third paper. This paper proposes improvements to the algorithms in terms of efficiency while maintaining a high probability of reaching most of the network with a message. These improvements are: 1) varying the gossip probability based on the number of neighbors; 2) estimating the propagation of the message and re-broadcasting the message if the the originator believes too few nodes have received it; 3) detecting when a route to a destination (timeout) cannot be found and re-broadcasting the message to try again; and 4) maintaining a zone around each node to facilitate more efficient transfer of messages across the network. The authors demonstrate the improved effectiveness and improved efficiency of these techniques with experimental data gathered in simulation. (2) and (3) are especially interesting because they are means by which individual nodes can learn about the entire network without actively querying large numbers of other nodes. (1) might be improved even further by using a continuous function of the number of neighbors rather than a step function when deciding how to increase the probability of a node forwarding a message to another. The third paper focuses on ad hoc networks as the context for the algorithm it presents. However, the simulations performed by the authors represent the networks as 2-dimensional distributions of nodes. It might be interesting to study if and how gossip protocols might be improved when communicating across a 3D distribution of nodes. One example of such a situation might be in a collection of cooperating UAVs flying at different altitudes, or a sensor network on rugged terrain. From: William Conner [wconner@glsn33.ews.uiuc.edu] Sent: Thursday, September 16, 2004 7:37 AM To: Indranil Gupta Subject: 598ig review 09/16 EPIDEMIC ALGORITHMS FOR REPLICATED DATABASE MAINTENANCE Anti-entropy (simple epidemic) and rumor mongering (complex epidemic) algorithms are presented that allow a database replicated at many sites to maintain mutual consistency among the sites despite updates taking place at individual sites. In such a scenario, an update at one individual site must be propagated to the other sites. The epidemic algorithms are randomized algorithms for propagating these updates. When a site inquires about an update (pull) or is offered an update (push), it checks whether or not its timestamp is more recent than the timestamp of the update. If its timestamp is more recent, then it does nothing. Otherwise, it sets its value to the value of the update. In a simple epidemic, one site randomly chooses another site and synchronizes their entire database contents. Epidemic theory shows that if one site is updated, then a simple epidemic will eventually cause every other site to be updated. In a complex epidemic, all N replicas start out as susceptible to an update. After one of the N sites receives an update, it becomes infected and shares its update with a random number of other sites. After a certain period of sharing, each infected site becomes removed (i.e., although it knows of the update, it no longer propagates the update). Unlike a simple epidemic, a complex epidemic can only state that updates will reach every site with a certain probability. This is due to the fact that there is a possibility that some sites might remain susceptible to updates after all infected sites have been removed (this is referred to as residue). Since anti-entropy is so slow, it is only suitable for use as a background process which will ensure that updates reach all sites even if rumor mongering fails. It would be interesting to find out whether the additional overhead from more aggressive rumor mongering alone would be less expensive than running a less aggressive version of rumor mongering with anti-entropy in the background. If more aggressive rumor mongering is less expensive, then that could eliminate the need for anti-entropy in the background. Of course, future work at the time of this publication would include the many network applications of epidemics, such as epidemic multicast. GOSSIP-BASED AD HOC ROUTING Gossip-based ad hoc routing is an alternative to the flooding of routing messages in multi-hop wireless networks with no fixed infrastructure. In such networks, the wireless bandwidth is limited as well as the power of individual nodes. Therefore, it is important to search for alternatives to flooding that will avoid propagating messages unnecessarily. The authors found that gossiping protocols applied to ad hoc networks exhibit bimodal behavior: either hardly any nodes get the message, or most of the nodes receive the message. The basic gossiping protocol, GOSSIP1(p,k), means that a source sends the route request with probability 1 and the first k hops will also broadcast that request with probability 1. After the first k hops, a node will broadcast that request with probability p and discard that request with probability (1 - p). Any duplicate requests are also discarded. With this approach, (1 - p) messages are saved from being unnecessarily propagated. To avoid a situation where none of a node's neighbors broadcast a request, GOSSIP2(p1,k,p2,n) was developed. In this protocol, p1 is that same as p in GOSSIP1 and k is the same as k in GOSSIP1. In GOSSIP2, a node will broadcast a request with probability p2 (which is greater than p1) if it has fewer than n neighbors. To prevent a message from dying too soon, GOSSIP3(p,k,m) was developed. The parameters p and k are the same as in GOSSIP1. However, if a node receives a message that it doesn't broadcast and this message isn't received from at least m of its neighbors in a certain timeout period, then it broadcasts the message to all of its neighbors. The performance of these gossip ad hoc network protocols depends on choosing the right parameters. Choosing the right parameters involves each node gathering topological information (which is constantly changing in real life ad hoc networks with mobile nodes). Gathering this topological information could introduce additional overhead. Also, it is not clear whether the reduction of control traffic by up to 35% is worth the longer routes (10-15% longer) that occur from gossiping rather than flooding. Future work would include finding an efficient way to choose good parameter values. BIMODAL MULTICAST Bimodal multicast is another gossiping, epidemic-like protocol. The authors point out that there are two notions of "reliability" in reliable multicast. The throughput of strong reliability protocols often suffers due to the mechanisms that are in place to ensure reliability. Therefore, strong reliability protocols usually aren't scalable. Best-effort reliability provides weaker guarantees, but it is scalable. The bimodal multicast protocol (called pbcast) scales well and provides predictable reliability under highly perturbed conditions. More specifically, the properties of pbcast are atomicity (i.e., almost all or almost none), throughput stability, ordering, multicast stability, detection of lost messages, and scalability. As proposed by A. Demers et al., pbcast runs two protocols (one for unreliable multicast and another in the background to handle any failures of the unreliable multicast). The protocol running in the background is an anti-entropy protocol that sends summaries of message histories to the selected process. It should be noted that pbcast's anti-entropy protocol focuses on recent messages by sending retransmissions in reverse order. It also gives up on old messages. The reason for this characteristic of the protocol is to prevent nodes that fail often from slowing down the rest of the multicast group. One possible problem with pbcast is that it might be unsuitable for replicated databases trying to achieve mutual consistency. Pbcast's notion of atomicity (i.e., almost all or almost none) is different than the traditional database notion of atomicity (i.e., all or nothing). The fact that this anti-entropy protocol can give up on old messages could definitely cause problems in a replicated database system (or any other system that requires complete consistency). The two implementations of pbcast also lack scalability since they only support several hundred nodes. Of course, there are many applications where this isn't a sufficient number of nodes. As the authors mention, future work would include managing multicast dissemination routes. From: Zahid Anwar [anwar@ncsa.uiuc.edu] Sent: Thursday, September 16, 2004 7:19 AM To: Indranil Gupta Subject: 598ig review 09/16 Zahid Anwar NetId: anwar CS598IG Review Epidemic Algorithms For Replicated Database Maintenance ------------------------------------------------------------- Summary ------- This paper presents randomized, epidemic algorithms for distributing updates in a replicated database to approach consistency. It analyses performance of two random epidemic algorithms (anti-entropy and rumor mongering) and implements algorithms in simulation and on Xerox Corporate Internet to measure rate of database consistency and network traffic. It emphasizes importance of spatial distributions for efficiency. Strengths --------- + Proposes a good solution to a very common problem of how to replicate a database across many sites while maintaining consistency? Compare with the naive approaches where each host is responsible for propagating their updates directly to all other hosts (subject to churn) and primary site update. + Using peer-to-peer randomized algorithms to disseminate updates in the network like an epidemic does not require full knowledge of network at any single host and works well with unreliable message delivery + Updates spread rapidly as more sites become “infected” + Expected time for update to propagate to n hosts using push is logarithmic Weaknesses ---------- - Harder to achieve consistency with randomized algorithm - How do we avoid generating tremendous network traffic? - Requires spatial distributions to efficiently spread information - Spatial locality interferes with the rate at which we achieve consistency - How does the systems scale as more more hosts (and updates) enter the system? Future Work ----------- Perhaps we can achieve better performance by replicating only summaries of data and propagating updates in a hierarchical manner. Review Gossip-Based Ad Hoc Routing ---------------------------------- Summary ------- The paper proposes a a gossiping-based approach to reduce the overhead of the routing protocols. Due to the changing and high variability nature of ad hoc networks, routing traffic can be a significant overhead and impediment to application performance. Many ad hoc routing protocols are based on (some variant of) flooding. Despite various optimizations, many routing messages are propagated unnecessarily. Strengths --------- + The protocol uses up 35% fewer messages than flooding with improved performance + Can be used with any algorithm + The approach can be combined with with various optimizations of flooding to yield further benefits. + Good performance improvement with AODV and even in networks as small as 150 nodes Weaknesses ---------- - Ungraceful degradation no forwarding at all when receiving requests from m neighbors (Gossip 3) Future Work ----------- Come up with an adaptive scheme for choosing m and when to broadcast a request in Gossip 3. Bimodal Multicast ----------------- Summary ------- Bimodal Multicast is a protocol developed by Ken Birman at Cornell. It is based on a gossiping concept developed by Cornell Professor Alan Demers while conducting research at Xerox PARC. This paper was the first to use the gossip concept at very high speeds. Prior to it, gossip systems were assumed to be inadequate for high speed multicasting. Bimodal Multicast proves messages can spread in milliseconds, debunking the assumption saturation would take several hours. Bimodel multicast is basically an epidemic protocol which functions by randomly sharing information amongst neighbors. It places itself in the middle of two 'schools' of reliable multicast - virtual synchrony and SRM. The tools using this are Astrolabe, Gravitational Gossip and Anonymous Gossip. Discussion ---------- + Epidemic approach of Bimodal Multicast generates short-range dependent traffic with low overhead traffic and transport delays. O Ozkasap, M Caglar, "Network Traffic Properties of Bimodal Multicast Protocol" + A pure peer to peer multicast system. All members talk amongst themselves, without the need of a server directing traffic. + The gossiping nature guarantees that with very high probability almost all or almost none of the messages will be delivered. + It is extremely unlikely that half of the members will receive a message while the other half does not. + The gossiping nature also allows messages to be multicast even under high packet loss conditions. + In addition, Bimodal Multicast does not require the identification of the sender or author of a packet, allowing for anonymous messaging. + Protocol is also very stable under steady load even if 25% of processes are perturbed (slow) + Scalable in much the same way as SRM From: Pei-Hsi Chen Sent: Thursday, September 16, 2004 6:20 AM To: Indranil Gupta Cc: Pei-Hsi Chen Subject: 598ig review 09/16 Pei-hsi Chen pchen14 --Epidemic Algorithms for replicated database maintenance This paper describes several randomized algorithms for distributing updates and reaching consistency for database maintenance. Three methods examined include: 1. Direct mail. 2. Anti-entropy. 3. Rumor mongering. Direct mail is nearly reliable but has basic fallibility of the mail system. The last two are epidemic processes. It is shown that distributions for both of them converge nearly as rapidly as the uniform distributon while reducing the average and maximum traffic per link. To design a "good" epidemics, three major concerns are 1.Residue, the remaining susceptibles when the epidemic finishes. It can be arbitrarily small. 2.Traffic for database updates between sites. 3. Delay. Also, two ways can be used: push and pull. For pull, it converges sooner, but when the database is quiescent, it causes fruitless traffic. So it benefits with high enough update rate. In contrast, push converges slower, but less traffic. When connection limit is used, push gets significantly better, while pull gets worse. In order to reduce the residue to zero instead of just arbitrarily small, use complex epidemic as initial distribution of updates with anti-entropy as a backup mechanism. Once a missing update is discovered when two sites perform anti-entropy, we can only let these two sites become consistent or redistribute the update. For deletion, dormant death certificates can efficiently prevent obsolete updates from resurrection with a reasonable storage cost. Spatial distributions are suggested to reduce communication costs. The probability of connecting to a site is decided according to the distance to favor nearby neighbors. Simulation results shows that although it takes more time for spatial distribution to converge, the traffic on critical links is significantly reduced when compared to uniform distribution. The following are some thought related to this paper. 1. For push to reduce useless traffic: for each node with a specific update message, memorize those nodes it has contacted and those nodes it has heard from. Randomly choose a node from which are not in them when next time trying to send the update message. 2. For pull, adjust the frequency to pull dynamically. When the database is nearly quiescent, decrease the frequency to near zero; increase the frequency according to update rate of the database. 3. The performance of spatial distributions seems will depends in part on the topology of the network. That is, if a group of nodes are far away from others, the update message may hardly propagate to the whole network. --Gossip-Based Ad Hoc Routing This paper proposes a gossiping-based approach for routing in Ad hoc network. Compared to flooding, this approach may use up to 35% fewer messages. Gossiping exhibits bimodal behavior: in some executions, the gossip dies out quickly ; in others, almost every node gets the message. This paper's goal is to make the gossip message spread out in whole network while also keeping the gossip probability low. In order to prevent from dying out soon when source has relatively few neighbors, it is suggested to gossip with probability 1 for the first k hops. From the experiments, we can see the bimodal behavior. It also shows that in finite networks nodes that are close to the boundary have dropoff in probability. That is due to two boundary effects: 1. fewer neighbors. 2. no back-propagation. One graphs from Experiments also shows that once p is bigger than some point, the probability that an execution successfully spread out the gossip message increases rapidly. The effect of the degree of the network on gossiping are tested too. With the same gossip probability, a higher- degree network performs better than a lower-degree network. Some optimizations are 1. make the gossip probability at a node to be a function of its degree. That is, a node with fewer neighbors has higher probability to propagate the gossip. 2. After receiving a message and deciding not to broadcast it, if then this node doesn't get many copies of the message, it broadcasts the message after the timeout period. 3. Retry the protocol if a route is not found. 4. Combine gossiping and the idea of Zones. It has more significant impact in small networks but not in large networks. Some comparisons between these two papers above: For epidemics in database usage, a node periodically chooses another node randomly to communicate, while in gossip of ad hoc routing, a node broadcasts a message to all its neighbors with probability p. The former one has no boundary effect, because every node can connect to any other nodes, while the latter one is for ad hoc networks, this assumption is impossible because of resource limitation such as limited power supply and wireless bandwidth. --Bimodal Multicast This article treats stability of multicast delivery as a basic reliability property. Multicast protocols usually have a tradeoff between scalability and reliability. Bimodal multicast (Pbcast) protocol can both scale well and provide predictable reliability. Moreover, compared to other reliable protocols, bimodal multicast remains more stable when bursts of packet loss occur. Pbcast is composed of two sub protocols: 1. An unreliable hierarchical broadcast. 2. A two-phase anti-entropy protocol. First of all, a message is broadcasted using a randomly selected spanning tree. After that, some members may not receive the message, and then the gossip-based anti- entropy protocol executes. Each member randomly chooses another member to send a summary of its message histories. Once a member discovers new messages it did not see before, it requests a copy of those messages from original senders. The goal is to converge toward identical histories. When a message becomes old enough, it will be given up and marked as "lost". This protocol prioritizes recovery of recent messages to let processes with transient failures to catch up with other processes and to prevent slowing down the overall system. This multicast protocol can be built in many realistic network environments, because it meets important requirements not addressed by pervious reliable multicast protocols. The idea is basically the same as the backing up a complex epidemic with anti-entropy in another paper. From: Ellick M Chan Sent: Thursday, September 16, 2004 3:25 AM To: Indranil Gupta Subject: CS598ig review 9/16/2004 Ellick Chan CS598ig 9/16/2004 Discussed with Jigar Doshi [jdoshi2@uiuc.edu] General Observations: -Almost all or none w.h.p., rather than all or none delivery -Multicasting possible even with high packet loss -Avoidance of excessive traffic -Hard to reach consistency -Worries about network flood -Sensitivity to spatial parameters, spread probability p -Tradeoff between spatial locality and spread rate Epidemic algorithms for replicated database management: Xerox Parc developed an advanced probabilistic replication algorithm for synchronizing remote databases because of deficiencies in existing replication techniques. Instead of relying on deterministic methods, which are often very bandwidth intensive, they opted for probabilistic gossip algorithms that relied on epidemic principles. They found that these algorithms generally provided better results. This paper provides several contributions. First, the authors carefully analyze the various methods of spreading, including direct mail, anti-entropy and rumor mongering. They examine the pros and cons of each approach, and mathematically prove the effectiveness of the probabilistic approach along with solid experimental data. They also give optimizations such as checksum-based rules that can help optimize anti-entropy. Finally, the issue of real-world networking is addressed through connection limits and node preference via spatial distributions. Comments/suggestions: Performance boost possible through summary of replication data (Bimodal multicast) [http://www.cs.cornell.edu/courses/cs614/2004sp/slides/Epidemic.pdf] Gossip-based Ad Hoc Routing: Gossip-based ad hoc routing is used to help ensure all nodes get a multicasted message in an unstructured system. This technique helps reduce wasted bandwidth caused by flooding a network to ensure delivery. The basic idea is to propagate messages received by a node based on certain policies. A simple policy is to have a hops to live counter on a message, and stop propagating once the counter reaches zero. Policies that are more complex are based on probability and information such as message repetition count. The key to making this system work well is the choice of nodes to spread to, probability of spread, and taking advantage of network topology. The paper suggests that back propagation of routes can help to establish more optimal spread. Another optimization is making the probability of spread, p, a function of the network characteristics of a node, such as degree and bandwidth. The authors suggest that high mobility of a node will help spread information faster. -In systems with nodes in motion, a node may not get an update if the node moves too quickly Some students discovered some better parameters for p, based on adaptation. The link is: [http://www.cs.sfu.ca/~zshi1/personal/projects/ARGR%20presentation_885.ppt] Bimodal multicast: Bimodal multicast falls in between strong and best effort consistency models. It uses two phases, repeated multicast, and anti-entropy. Repeated multicast ensures that most nodes get the message, while anti-entropy relies on message digest messages to inform nodes of messages that they were supposed to receive. Nodes that have not received the message then solicit these messages from other nodes. This hybrid solution ensures speedy operation, scalability and elimination of lag associated with highly consistent models such as Isis. From a network point of view, Bimodal multicast has constant load on participants and communications as well as tunable reliability through parameter changes. Most importantly, it ensures steady data delivery rate with low changes in throughput, even with moderate packet loss and partitioning. One argument the authors make is that requests for missing messages should favor more recently timestamped messages. This approach can lead to loss of messages just at the threshold of the message queue window. In addition, bimodal multicast does not remember routes, nor does it detect bad links. One possible improvement may be the use of parity or error correction codes to recover lost messages. Each digest can include a constant-sized parity message, which can recover missing messages. This approach may work well on sensor networks, or when there is moderate packet loss to avoid excessive network usage. Another improvement is to favor supernodes, or more powerful nodes to solicit missing messages from. Supernodes may queue more messages than normal nodes and have better network connections to other nodes. From: James Richard Newell Sent: Thursday, September 16, 2004 1:47 AM To: Indranil Gupta Subject: 598ig review 09/16 Epidemics: Epidemic Algorithms, Bimodal Multicast, and Gossip-based Ad Hoc Routing James Newell 9/15/2004 These three papers obviously had a strong emphasis on epidemic or gossip-style routing algorithms for message multicasting and node updates. While each paper saw a slightly different use for the algorithm (network consistency, stability, or ad hoc networks) they used fundamentally similar techniques with slight variations to achieve these goals. They all boil down to the fact that this algorithm is simple, light-weight and can broadcast messages to the majority of nodes with provable high-probability. The first paper Epidemic Algorithms for Replicated Database Maintenance by Demers, Greene, Hauser, Irish, Larson, Shenker, Sturgis, Swinehart, and Terry is an older paper that lays the foundation for epidemic algorithms in common networks. Their interest in epidemics seems to stem from the problem of keeping their servers consistent at PARC without running the network into the ground. They discuss two different epidemic methods for updating their network: anti-entropy and rumor spreading. Anti-entropy is a method where two neighbors will randomly talk to each other to exchange database differences. While reliable, anti-entropy unfortunately requires lots of data to be exchanged to be effective and is quite slow at convergence. The two variations of anti-entropy revolve around whether the asker updates the asked or vise-versa (push/pull). Rumor spreading is a more traditional epidemic protocol, where an active site will choose a random neighbor and notify it of the update, if it hasn't seen the update the neighbor will now become active. If it has, the active state may become inactive after either a threshold or some probability. There are many variations of this protocols which are listed in the paper that have differing effects on the protocol's probabilistic behavior. The authors then go on to show how the protocols perform with spatial requirements in mind and different network topologies. Unfortunately, there appears to be pathological network configurations that provide performance problems to the above protocols. Resolving this was still open for research at the time the paper was written. This paper seemed to be a first good step for epidemic protocols. It is interesting to see that the actual protocol itself didn't change much in the later papers from this original paper. The authors however don't actually analyze the performance and nuances of its behavior with as much depth as the later papers. They also seem to only use idealistic conditions (late at night) and only to their environment. Furthermore, I wasn't sure if they made it clear how they obtain the data that they did (which simulator, parameters, techniques, etc.). Fortunately, they can get away with this since the core of the paper revolves around theoretical ideas and equations. The next paper is Bimodal Multicast by Birman, Hayden, Ozkasap, Xiao, Budiu, and Minsky. This paper proposes a version of the epidemic protocol that is both reliable and stable. This means that it can provide predictable reliability even under heavy network stress along with strong data throughput. This allows their network to scale where protocols with stronger reliability guarantees cannot. Therefore, this protocol is beneficial to systems that require constant reliable fresh data, but are flexible to minor data loss (media streams, air traffic controllers, etc.). It also can be used in conjunction with other protocols depending on the specific function of the implementation. The biggest "idea" that is put forth in this paper is the idea that their two-protocol implementation (IP-Multicast and Anti-entropy) exhibits a bimodal behavior where with high probability either hardly any nodes receive the message or most nodes receive the message. The "worst case" would be considered when about half receive the message causing network-wide agreement to be difficult. They also show that network stability can be improved by giving priority to freshness instead of hard reliability. This prevents a node or the whole system to lag due to missed messages. The authors then provide some graphs from simulation runs and a simple implementation of their idea. I thought this paper was kind of long-winded about the idea they were discussing. I thought it could have been summarized in a more concise fashion. Overall, the idea was interesting and bimodal properties were a good contribution to the epidemics world. However, I question any protocol that relies on IP multicast as the underlying technology due to the many problems associated with its deployment. However, it appeared that gossip mongering could be used as a backup. This paper seemed like a nice follow-up/next-step to the previous paper. The last paper Gossip-Based Ad Hoc Routing by Haas, Halpern, and Li shows how the ideas from the previous papers can be applied to ad hoc networks such as wireless and sensor networks. Ad hoc networks differ from traditional networks in that communication is primarily dependent on local broadcasting to neighbors. Traditionally, these networks would use flooding techniques to disseminate data to every node in the network. However, this is costly considering the nodes are typically resource constrained. The authors show that using a simple gossip protocol similar to the one in the original paper (rumor mongering) can be effective with less redundant messages. They give a variety of gossip schemes to improve performance like increasing the probability of remaining active due to low amount of neighbors or the amount redundant message received. They also show that gossip is reliable due to the bimodal features discussed in the second paper. This way then can retry quickly if the original gossip happened to fail. Finally, they give a lot of statistics and data about how the protocol performs with different parameters and environments. This is an interesting application to the epidemic algorithm because ad hoc networks really require this type of protocol since there isn't a direct communication link between all nodes. One thing I noticed was that the gossip optimizing techniques required knowledge about your neighbors, which isn't a simple assumption. Overall, I think this is a valid idea and should out perform flooding techniques. From: Romit Roy Choudhury [croy@crhc.uiuc.edu] Sent: Thursday, September 16, 2004 12:43 AM To: Indranil Gupta Subject: cs598ig review 9/16 Epidemic Algorithms for Replicated Database Maintenance ------------------------------------------------------- This paper addresses a simple and interesting problem of maintaining replication consistency among a set of objects (databases for example) in which updates and deletions are taking place periodically. The copies of the databases are distributed on multiple nodes, connected by a network. This paper investigates 3 mechanisms -- Direct mail, Anti-entropy, and Rumor mongering. Three primitives, called the push, pull, and push-pull primitives are described for means of updation, and the algorithms are designed on these primitives. An interesting observation in this paper is the difference between the performance of push and pull. When using anti-entropy, pull performs better than push on average, in terms of the time of convergence. When using with rumor techniques, and complementing the technique with a removal probability, authors observed that pull again performed better. The intuition is that for push, infectious nodes contact each other and remove out faster, while for pull, if susceptible nodes contact each other, they do not dampen their ability to continue trying. As a result pull results in better percolation of the updates over the entire network. Another interesting point relates to deletions on the objects. The issue arises when a deletion is overwritten by other old copies of the object. To address this problem, authors propose a death certificate scheme in which the deletion of an item in a database is remebered separately so that on trying to overwrite it, the death certificate can act as a prohibitive notice. This allows deletion updates to propagate faster. Gossip-Based Ad Hoc Routing --------------------------- This is a proposal to reduce the overhead of broadcasting in wireless networks using the gossip mechanism. Nodes, upon receiving a message, rebroadcasts the message with a probability p. Gossiping shows bimodal behavior, but may be applicable for practical purposes if carefully modified. An interesting suggestion with gossiping was the adaption of p, based on the overheard messages in a small duration of time. This is a means of handling the possibility of a gossip dying out before reaching all the nodes. Another variation was where p was chosen based on the number of neighbors of each node. Zone based gossip, whereby gossips need to spread to multiple zones (and within the zones, messages are spread with surity) is another interesting approach. Where topologies are characterized by critical nodes (say nodes that connect two clusters), gossips can be dangerous since not reaching this critical node can prevent the gossip from spreading to one of the clusters. In such cases, adapting p based on neighbor relationships can be another variation to gossip. Bimodal Multicast ----------------- This is a proposal of a multicast scheme that tries to provide throughput stability with the element of scalability. The proposal tries to benefit from using gossip-based schemes, like anti-entropy. The anti-entropy scheme is used as a means of supporting potential losses in the network that can happen in the regular multicast stages. Processes exchange a summary of their recently observed messages, thereby converging to identical histories, while not getting delayed due to transient losses. Several interesting optimizations were proposed to the pbcast mechanism. The performance results showed that even under high perturbations, healthy processes remain unaffected. ----------------------------------------------------------- Romit Roy Choudhury www.crhc.uiuc.edu/~croy ----------------------------------------------------------- From: Jigar Harshad Doshi Sent: Thursday, September 16, 2004 12:39 AM To: Indranil Gupta Subject: 598ig review 09/16 Epidemic Algorithms for Replicated Database Maintenance: Brief Summary: This seminal paper presents algorithms for maintaining distributed databases. Three approaches are outlined, namely direct mail, anti entropy (a simple epidemic) and gossiping (called rumor mongering - a complex epidemic). Algorithms which combine the approaches are also presented specifically anti entropy is shown to be a good back up mechanism to any approach used in the foreground. Anti entropy uses three mechanisms namely push, pull and push – pull and then combines it with direct mail in order to update a database. Pull and push-pull have similar performance results whereas push is found to be less efficient in the simple versions of the algorithm. However if a the number of connections is bounded, the performance changes .The key drawback of anti-entropy is that the entire database is checked. However mechanisms to get around that too are presented. The key contribution of this paper is however the gossiping mechanism (referred to here as rumor mongering) proposed. An analysis of gossiping is presented and the authors work on variants such as blind, feedback, counter and coin. An optimized mechanism obtained by the combination of gossiping and entropy is presented. Deletion updates become complex when gossiping issued. The authors present a solution to the problem using death certificates with two timers. These algorithms are highly dependent on underlying topology and the authors work on optimizing the average distance of neighbors chosen according to the topology in order to get logarithmic performances. However this is not guaranteed. Also when link weights are applied to the protocol then the problem becomes even more complex. One of the problems with gossiping is that it may have quiescent behavior where in the gossip dies out before it reaches all nodes. But the authors have analyzed the problem and presented mechanisms to get around the same. One of the key issues with applying the algorithms presented here to other systems is the fact that here it is assumed that the every node is fully connected and can thus communicate with all other nodes. It also assumes that every node ‘knows’ every other node. Thus the optimized algorithms presented here have to be changed in order to accommodate more restrictive systems, also gossiping as a mechanism is insecure in that the targets are random and it can be subject to spoofing attacks as well as man in the middle attacks. They also may be vulnerable to DoS attacks. Thus security could be added to a gossip protocol. The Key Design issue learnt here is the simple and complex epidemic mechanisms that can be applied to any search and distribution problems. Gossip Based Ad Hoc Routing: Broadcast is a fundamental mechanism in any ad hoc routing protocol. However broadcasting as a mechanism is very inefficient; in this paper gossiping is applied to broadcasting in ad hoc networks. Instead of broadcasting every time a node receives a message, a node broadcasts with probability p. Thus broadcast redundancy is avoided and in the simple case, a saving of 1 –p can be obtained. One of the problems with gossiping is that there is always a danger that the gossip dies out before reaching all the nodes. Thus a number of optimizations on gossiping are proposed. The first optimization is having gossip broadcast with a probability 1 if number of hops is less than k. Also gossiping is studied with different graphs of interest. In particular the source of the gossip’s position is studied and the authors discover that the nodes at the border always have less gossiping throughput. A few heuristics are provided to improve gossiping like a two threshold scheme. Gossiping retries are introduced in order to improve performance. Also gossiping in ad hoc zones as proposed in ZRP is studied. Finally the authors apply gossiping to AODV and study the performance which is shown to improve considerably. One of the prime shortcomings of the paper is that it does not provide any analysis involving node mobility. One of the main problems in ad hoc networks is node mobility. How the performance of gossip varies when node mobility at different velocities is an open question. The authors have studied this when it is applied to AODV, but analysis of the effects of mobility on gossip itself is missing. For example due to node mobility the heuristics applied like Gossip(p,k,m) are invalid. A node may not hear the gossip m times because it has moved into a new neighborhood and thus it will redundantly broadcast. Though the message still reaches everyone, the efficiency decreases. Further work could involve applying Gossiping to any ad hoc routing protocol like OLSR, DSDV etc. Such proactive protocols are prime candidates for gossiping. Also in ad hoc networks finding one hop neighbors of any node is not very difficult. Low overhead mechanisms like snooping can be used. Thus a node can improve gossiping using the 1 hop neighbor knowledge it has. It can select gossiping targets and not rely on broadcasts like the previous example. BiModal Multicast: Bimodal multicast is a technique to make multicast protocols reliable. It strikes a balance between protocols that offer all or nothing delivery and the best effort protocols whose reliability is hard o analyze. The authors claim that a designer of critical applications is forced to choose between reduced scalability but stronger notions of reliability in the first case and weaker guarantees but better normal-case scalability afforded by the second class. Bimodal places itself between these two classes by predictable performance, easy tuning and scalability. Some of the salient properties of the protocol include the fact that it is atomic wherein in all or nothing is replaced with almost all or almost none. Its throughput is stable and ordering is fifo which is crucial for applications such as audio and video streaming. The protocol detects the stability of messages meaning which can be garbage collected and also detects lost messages. While guaranteeing this, bimodal multicast remains stable. The protocols architecture is pretty simple. It initially uses an unreliable multicast to deliver the message. The a gossiping mechanism takes over in the background. The gossiping identifies the message but does not include them. Any missing message is solicited which is then followed by retransmission of the required message. The messages are delivered in FiFo order and are garbage collected appropriately. However there is always a possibility of someone getting left behind and overloading the system. To avoid this, a number of optimizations are provided like LRU based request of retransmissions first. Some of the limitations and assumptions of the protocol include the fact that the throughput and loss properties of most links are assumed to be known. Also it is assumed that most processes are responsive to messages in bounded time. Lastly the protocol does not consider Byzantine failures. Also flow based priority maintenance of failure lists could improve the protocol, by providing service differentiation and QoS type parameters. One of the salient features of the protocol is that it is stable even uder heavy perturbrations and has low overhead. Overall it is a good tradeoff for users of multicast protocols. From: Pradeep Kyasanur [kyasanur@crhc.uiuc.edu] Sent: Thursday, September 16, 2004 12:14 AM To: Indranil Gupta Subject: 598ig review 09/16 Pradeep Kyasanur netid: kyasanur 1. Epidemic algorithms for replicated database maintenance, A. Demers et al, PODC 1987. The paper addresses the problem of efficiently maintaining loose consistency in a replicated database in the presence of updates. Randomized algorithms that utilize the results from epidemiology are proposed to support efficient updates. The updates proceed in two phases: in the first phase updates are sent to each replicated database deterministically. Since, the underlying network may fail to deliver all messages, a second phase based on "anti-entropy" is used to ensure with high probability all database hosts receive a copy of the update. Anti-entropy is an mechanism where every node periodically chooses another node at random and exchanges update messages. If the node initiating the exchange already has a copy of the update and wishes to pass it on to other nodes, then the algorithm is defined to be a "Push" algorithm. Similarly, if the initiator of the exchange is a node seeking an update, then the algorithm is defined to be a "Pull" algorithm. The paper proves that "Pull" algorithm is suitable (when used in the second phase of the update process). Rumor mongering is a more powerful epidemic algorithm that can remove the necessity for an initial deterministic update phase. The reliability of a rumor-mongering algorithm can be enhanced by using an additional phase that uses anti-entropy. The paper also considers mechanisms to tune the proposed algorithms to the spatial distribution of nodes in the network. Epidemic algorithms provide efficient solutions to problems for which probabilistic guarantees suffice. The algorithms scale well with to large networks. However, there appears to be a latency penalty involved with using a probabilistic solution, as updates may traverse indirect routes. 2. Bimodal multicast, K Birman et al, ACM TOCS 1999 Bimodal multicast is a multicast protocol that offers scalable performance in addition to providing probabilistic reliability guarantees. Earlier work either provided rigorous reliability at the cost of scalability, or scalability at the cost of best-effort reliability. However, there is a large class of applications that desire both reliability (probabilistic guarantees suffice) and scalability, and bimodal multicast addresses this need. Bimodal multicast uses two stages: the first stage uses an unreliable multicast protocol, such as IP multicast, to deterministically multicast messages along a randomly chosen spanning tree that encompasses the multicast group. Since message delivery is unreliable a second phase based on anti-entropy is used to ensure all nodes receive the message with high probability. Bimodal multicast prioritizes the recovery of recent messages as they are often more valuable to the applications than old messages. The two stages are typically executing in parallel, and together provide high reliability. The use of anti-entropy ensures "bimodal" behavior, i.e., messages are received by most nodes with high probability, and received by few nodes with low probability. The bimodal behavior provides an "almost all or almost none" property, which is sufficient for many practical applications. The paper makes a strong case for the utility of probabilistic communication tools in practical applications. 3. Gossip-based ad hoc routing, Z. Haas et al, Infocom 2002 Ad hoc routing protocols often use a flooding mechanism for route discovery and maintenance. This paper proposes the use of a gossip mechanism (a form of probabilistic flooding) to reduce the overhead of flooding, while continuing to find routes with high probability. Nodes forward the routing packet with a specified (gossip) probability p, which can be tuned based on the underlying network topology. The authors demonstrate that p between 0.6 to 0.8 is sufficient for most networks, thereby reducing routing overhead by 20% to 40%. The paper also integrates the gossiping mechanism with well-known ad hoc routing protocols. Theoretical results show that routes chosen using gossip mechanisms may be longer than routes chosen using flooding. Consequently, if routes change infrequently, it may be better to use flooding to optimize the chosen route. From: Dennis Y Chi Sent: Thursday, September 16, 2004 12:03 AM To: Indranil Gupta Subject: 598ig review 09/16 Epidemics Gossip-Based Ad Hoc Routing The gossip-based ad hoc routing protocol attempts to reduce the overhead of routing messages in comparison to other routing protocols that use flooding. Gossiping involves forwarding a message with a probability, p, which results in a bimodal behavior such that either very few nodes receive the message, or most of them do. The main gossip protocol described by the paper, Gossip1(p,k), gossips messages with probability 1 for the first k hops and with probability p for the rest of the hops. Obviously, performance is dependent on the choosing optimal values of p and k, which is dependent on the topology of the network (size, node degree, etc). The paper provides theoretical analysis and performance results to show that the gossiping protocol exhibits bimodal behavior, that higher values of p result in more nodes receiving the messages, and that k does not increase the probability that a node receives the message. Various optimizations can be made to the gossip protocol to improve performance such as retrying the protocol if it is known to have failed, increasing the gossip probability of a node’s neighbors for nodes with lower degrees (fewer neighbors), re-broadcasting messages to avoid premature gossip failure, or integrating gossiping with zones. Gossiping is definitely advantageous in specific ad hoc networks where mobility is high and broadcast messages are frequent, but can networks cope with the bimodal behavior? Flooding may result in unnecessary additional message overhead, but at least it guarantees that all nodes receive the messages. Gossiping alone does not seem like a viable routing protocol for ad hoc networks, but it provides advantages when integrated with existing protocols that use flooding mechanisms. For example, the integration with AODV showed that the use of gossiping for routing requests improved the end-to-end delay and packet delivery success without adversely affecting route lengths. Epidemic Algorithms for Replicated Database Maintenance When designing algorithms for replicated database maintenance (or other distributed applications), the time required for updates to propagate, the amount of traffic generated by updates, and scalability are important performance factors that should be considered. One of the most basic methods is direct mail, in which all other sites are notified immediately after an update occurs. This strategy propagates updates quickly, but the mail system may fail and/or source sites may not know about all the other sites. Anti-entropy is a simple type of epidemic mechanism, in which every site regularly chooses another site at random to check for updates and exchange database contents (if necessary). This method is more reliable than direct mail, but propagates updates much slower. Rumor mongering is a more complex type of epidemic in which sites will periodically send updates to random other sites, but will stop propagating the update when too many other sites have already received it. This method is more reliable than direct mail, requires fewer resources than anti-entropy, but cannot guarantee that all sites will receive the update. The paper also shows that spatial, non-uniform distributions can be used with anti-entropy and rumor mongering to reduce traffic per link without any overly negative effects. This paper and the “Gossip-Based Ad Hoc Routing” paper both describe methods of using the gossiping/epidemic method in conjunction with some other method. For example, this paper describes how anti-entropy can be employed as a backup mechanism for complex epidemics. It is interesting to see how different methods can be integrated with gossiping/epidemics to improve its reliability, but take advantage of its low overhead. Bimodal Multicast Currently, reliable multicast protocols provide either strong guarantees on message delivery or scalability, not both. The bimodal multicast protocol is an attempt to improve multicast protocols by providing scalability, reliability, and stable delivery throughput. Bimodal multicast has two phases that work concurrently: an unreliable multicast protocol (i.e. IP multicast) and a two-phase anti-entropy protocol that first checks for message loss and corrects it if necessary. Additional optimizations must be made to the protocol to handle unpredictability, including placing data limits on retransmissions per process per round, avoiding redundant retransmissions, sending most-recent-first retransmissions, and multicasting retransmissions if there were multiple requests for the same message. Based on the performance results, the bimodal multicast protocol has a low overhead, scales well, and maintains steady throughput under unpredictable conditions (high perturbations). But one of the strengths and weaknesses of this protocol is that it is built for very specific environments. The protocol basically assumes that the network is a synchronous system with no byzantine failures (section 3) and is only deal for applications with high-volume workloads that can handle small inconsistencies. But because it is so specific, if an application fits the requirements, all the parameters can be set to achieve the application’s reliability requirements. As stated earlier, it seems that these epidemic protocols are not meant to be stand-alone, and this protocol is no exception. “We see it as a tool to offer side by side with other reliability tools, but not as a solution that competes with previous protocols”. This protocol also might have problems dealing with nodes that are flooded with requests (in the anti-entropy stage) and the paper doesn’t seem to mention the overhead of maintaining a consistent group view. Bimodal multicast’s two-phase protocol is interesting because it is very similar to the one described in “Epidemic Algorithms for Replicated Database Maintenance”, in which the Clearinghouse Servers on the Xerox CIN used direct mail followed by anti-entropy. In that case, the anti-entropy was used to fix any errors by the direct mail stage. The multicast + anti-entropy protocol is similar, but it seems like the unreliable multicast stage is actually working for the gossip protocol, attempting to infect as many nodes as possible to increase the probability that the anti-entropy stage is successful. From: Moosa Muhammad Sent: Monday, September 13, 2004 12:59 AM To: Indranil Gupta Subject: 598ig review 09/16 Reviews Written By: Moosa Muhammad For Presentation Dated: 9/16/2004 (1) Summary of “Epidemic algorithms for replicated database maintenance” In order for the database to be reliable and handle larger traffic loads, it is replicated among various sites. This results in the problem of keeping all of them consistent. This paper describes several randomized algorithms for distributing updates to drive the replicas toward consistency, and offers a good replacement for complex deterministic algorithms. The algorithms that are presented here closely resemble to epidemic algorithms. The Anti-Entropy algorithm has been implemented in the Clearinghouse servers of Xerox, in order to solve problems of high-traffic and database inconsistency. By using a well chosen spatial distribution for selecting Anti-Entropy partners, the implemented algorithm reduces average link traffic by a factor of more than 4 and traffic on certain critical links by a factor of 30. Important factors to keep in mind while choosing a suitable algorithm for this problem are: The time required for an update to propagate to all sites and the extra network traffic generated due to this. They proposed and analyzed three methods of spreading updates, which are: (a) Direct Mail: each update is immediately sent from its entry site to all other sites. After a node receives an update, the timestamp of the update is compared with the node's local timestamp. If the update has the later timestamp, then it is applied to the node. (b) Anti-Entropy: each site regularly chooses another site at random, to exchange its database contents with it, and resolves any differences b/w the two. (c) Rumor Mongering: when a site receives a new update, it shares it with other sites. When it has shared with a large # of sites, then it stops propagating that message any more. In order to delete items from the database, we first spread death certificates (carrying timestamps). During propagation, when old copies of deleted items meet these certificates, the old items are removed. A simple solution to making sure that these certificates are removed completely is to hold them for some time (e.g. 30 days) and then discard them. Direct Mail method sends updates immediately and it is also reasonably efficient (as the traffic generated by it is proportional to the number of sites times the average distance b/w sites). There are reliability concerns with it b/c individual sites do not always know about all other sites and since mail is sometimes lost. Analysis for Anti-Entropy showed that it propagates updates slower than Direct Mail, but is very reliable. An issue faced by the Rumor Mongering method is that an update is not guaranteed to reach all sites. ---------------------- (2) Summary of “Gossip-based ad hoc routing” They propose a gossiping-based approach, where each node forwards a message with some probability, to reduce the overhead of the routing protocols. Gossiping exhibits bimodal behavior in sufficiently large networks. Thus, in almost all executions of the algorithm, either hardly any nodes receive the message, or most of them do. The ideal case is when the fraction of executions where the gossip dies out is relatively low while keeping the gossip probability low, to reduce the message overhead. The goal of this paper is to investigate the extent to which this can be done. Their results showed a 35% decrease in message overhead as compared to flooding. Adding gossiping to Ad hoc On Demand Distance Vector (AODV) routing algorithm can also improve network performance in terms of end-to-end latency and throughput, along with improvement in the number of messages sent. Their work differs from others since they are applying gossiping to ad-hoc networks, where the assumption that every node can send a gossip message to every other node (due to a link present b/w them) does not always hold. Their problem is to find routes to different nodes. Their basic Bimodal behavior based gossip protocol is GOSSIP1(p,k). In this protocol, they gossip with probability 1 for the first k hops before continuing to gossip with probability p. The source sends the route request to nodes within k hops, with probability 1. When a node receives a route request, it broadcasts the request with probability p and discards it with probability 1-p. Duplicate messages are discarded by the nodes. Their first experiment involved a "medium-sized" network of 1000 nodes. With GOSSIP1(0.72,4), almost all the nodes received the message, with a slight drop-off at distance > 50 hops. They noticed that lowering the gossip probability further can result in the same bimodal behavior, if the fraction of executions in which all nodes and no nodes get the message, is changed also. Next they applied their protocol to random graphs, first consisting of 1000 nodes (degree=8) and with 1200 nodes (degree=10). With probability of 0.65, they noticed that almost all nodes got the message in almost all executions. All graphs above showed a drop-off in probability for nodes that were close to the boundary of the grid. Some of the optimizations that they suggested for the gossiping protocol include: (a) A two-threshold scheme: The idea behind this is that the gossiping protocol can run in conjunction with other protocols that maintain fairly accurate information regarding a node's neighbors. The gossiping protocol can then make use of that extra information. (b) Preventing premature gossip death: If a node with n neighbors receives a message and does not broadcast it, but then does not receive the message from at least m neighbors within a reasonable timeout period, it broadcasts the message to all its neighbors. They found m=1 to be the optimal value. (c) Retries: When using a gossiping protocol, there is always a possibility that a route will not be found even if it exists. A simple solution to this problem, which also works pretty well, is to retry the protocol. Because of the bimodal distribution, a good way of deciding whether to retry the protocol or not, is based on the number of duplicate messages received by this node. If large number of duplicates is received then no retry; otherwise yes. (d) Zones: This optimization for flooding protocols can be applied here also. Each node maintains a routing table of all the nodes in its zone (i.e. within a certain number of hops from this node). Then messages can be sent to a destination node (present in its zone) directly, with minimum routing delay. The criticism of this paper was that it contained a lot of heuristics for choosing gossip parameters, without providing a good explanation of why these chose those particular parameters. Some ideas for future work in this area include finding good techniques to come up with appropriate gossip parameters, instead of just basing it on heuristics. Another extension to this could be to have the gossip parameters (especially the probability) dynamically change for each node that the query is routed through, based on some real-time network statistics. Also, it can be tried out in conjunction with other routing protocols like the Grid. ---------------------- (3) Summary of “Bimodal multicast” This paper presents a new protocol called bimodal multicast (or probabilistic broadcast) protocol that scales well, and provides predictable reliability and stability even under highly disturbed conditions, such as router and IP multicast failures. It lies in between the two extremes most commonly studied to provide reliable multicast: a) Reliable protocols that guarantee delivery (guarantees atomicity which is expensive and my lack stable throughput needed in soft real-time applications) b) “Best-effort” delivery protocol that is inexpensive and scalable, but lacks the end-to-end guarantees (needed for critical applications) This protocol is supposed to be used in conjunction with another protocol like virtual synchrony, to handle the high-volume workload of an application. The reliability and throughput of this protocol remains steady even as the network packet loss rate rises to 20% and even when 25% of the participating processes experience transient performance failures. They also showed that the LAN implementation of their protocol overcomes bursts of packet loss with minimal disruption of throughput. This protocol satisfies the following properties: a) Atomicity – each multicast will reach almost all or almost none of the processes b) Throughput Stability – low variation in multicast throughput rates c) Ordering – FIFO ordering on a per-sender basis d) Multicast Stability – the protocol detects stability of messages, i.e. bimodal delivery guarantee has been achieved e) Detection of lost messages – processes (most likely faulty) that did not receive a message are informed via an upcall f) Scalability – overhead costs are constant and throughput variation grows slowly, as the network size increases The main protocol consists of two subprotocols. The first is an unreliable hierarchical broadcast that makes a best-effort attempt to efficiently deliver each message to its destinations. The second is a 2-phase gossip-based anti-entropy protocol that operates in a series of unsynchronized rounds. During each round, the first phase detects message loss, while the second phase corrects such losses and runs only if needed. In order to limit this subprotocol’s costs under failure scenarios, some optimizations (such as soft-failure detection) can be applied to it. One of the criticisms against this paper is that they did not consider Byzantine failures in their experiments and analysis, even though it is an integral part of guaranteeing stability of reliable multicasts. Also, the system model that they used to analyze their protocol is a static set of processes communicating synchronously over a fully connected, point-to-point network. Therefore they assumed a synchronous model, where message delays were bounded and known. A possible future work would be to modify their basic system model to accommodate for asynchronous networks and come up with performance analysis for it. -- Moosa Muhammad Department of Computer Science, UIUC https://netfiles.uiuc.edu/mmuhamma/www/