From: Chandrasekar Ramachandran [cramach2@uiuc.edu] Sent: Tuesday, March 25, 2008 10:28 AM To: indy@cs.uiuc.edu Subject: 525 review 03/25 1 .Peer-to-peer membership management for gossip-based protocols Ayalvadi J. Ganesh , Anne-Marie Kermarrec and Laurent Massouli´e In this paper a scalable peer to peer membership protocol is presented. By generating reliable partial views, a robustness to failure is achieved by the proposed protocol. The protocol also vitiates the need to know the size of the entire network, and is thus scalable in design. A comprehensiveness is achieved in design as there is a consistent update of these views while maintaining the other requirements of a scalable membership protocol. By maintaining these partial views, different destination targets for gossip notifications of around log(n) size. Following are the pros and cons of this paper. Pros: 1. The choice of the probability with which a new subscriber is integrated in the partial view seems to be a very intuitive way for balancing view sizes. 2. The high rate of a gossip based messages reaching the surviving nodes even in times of nearly 50% failure Cons Not much that I can think of. But one thing that comes to my mind is the queuing or dealing of messages is not taken into account. 2. T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies Mark Jelasity and Ozalp Babaoglu This paper presents a protocol for the construction of a class of general topologies.This protocol handles dynamism very well by organizing runs as epochs. The simulation experiments show the reasonably high precision of this protocol as only around 100 links go missing even in cycle 100. There is fast convergence time even for a very small view size.The authors propose this excellent topology construction protocol by the set of node descriptors and their associated methods. Pros: 1. The fantastic log convergence time for the protocol is a metric which favors the use of this protocol. and its scalability with a growing topology. Cons: Synchronization hasn't been explored in detail. From: fariba.mahboobe.khan@gmail.com on behalf of Fariba Khan [fkhan2@uiuc.edu] Sent: Tuesday, March 25, 2008 9:46 AM To: indy@cs.uiuc.edu Subject: 525 review 03/25 CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays, S.Voulgaris et al, Journal Network Systems and Management, June 2005. Cyclon is a gossip-based membership management protocol for unstructured p2p.failures can be modeled as random graphs. At each step pairs of peers shuffle neighbors. The peer selected is the one with highest "age". Then it selects few pointers to move to the peer and peer send back few pointers for it. The paper leaves out lot of explanations. You have a good hint after some discussion but a lot of time a formal definition or explanation would be better. Some examples issues not answered: What is a random graph? Why "age" the nodes? How is the initial graph created? How will survive partitioning? At the end you know these are achieved through properties of random graph and random walk but the explanations are all over. They run an event-driven simulator in C++ for the experiments. In my best guess which should just be a random graph simulator. The eccentrics of real Internet and complexities introduced by layers and headers are not there. But still it is amazing to see how randomness can bring convergence in a big network. In conclusion I would actually say this was a cool paper on p2p network construction and management ------------------------------------------------------------------------ --------------------------------------------------- T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies, M. Jelasity et al, U. Bologna Tech Report. 2004 T-man is a gossip-based protocol for construction of large scale p2p topology of various structures. Authors show how topologies like ring and torus can be formed. The basic idea starts with input of set of N nodes, each node having neighbor view size c and a ranking function R that can order a list of nodes in increasing distance from a given node. This ranking is based on distance of two nodes given by d(x, y). d determines the structure of the topology. For example for a ring d(x, y) is difference in node id x – y. Each node runs a periodic view exchange protocol and waits indefinitely for others nodes sending out views. The initial phase of convergence is exponential. But not complete. Authors make modifications to speed this up at the end. The change comes from the observation that at the end nodes are climbing on to already formed structure. The results have theoretical importance. Authors assume single synchronization point and reliable nodes and channel. This assumptions will be very costly to maintain in a real network. From: hossein.ahmadi@gmail.com on behalf of Hossein Ahmadi [hahmadi2@uiuc.edu] Sent: Tuesday, March 25, 2008 9:28 AM To: indy@cs.uiuc.edu Subject: 525 review 03/25 Hossein Ahmadi (hahmadi2) Review 7 ============================================================================== CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays Spyros Voulgaris, Daniela Gavidia, and Maarten van Steen Keeping an unstructured P2P network balanced and having its diameter to be low are two desirable properties for higher level applications like gossip based dissemination. Membership management is responsible to provide such properties in an unstructured overlay. This paper first studies a simple approach called shuffling through the experiments. Shuffling tries to exchange some of each nodes neighbors with others periodically. In this way, the connectivity of the network is always maintained when there is no failure. Moreover, the degree of each node is concentrated around the size of its neighbor cache. However, authors show that basic shuffling can not provide enough randomness in the output overlay. The paper proposes CYCLON protocol based on shuffling. In CYCLON nodes exchange their neighbors as in basic shuffling. The difference is that they don't choose neighbors at random. Instead, they use the node with earliest joining information. The reason is to make to each neighbor's living time balanced. Through the experiments, authors show that CYCLON can provide more randomness than basic shuffling. Next, authors address the join and leave in CYCLON. They claim that using a random walk during the join would guarantee randomness. Nodes are not explicitly removed from the caches, but their failures are detected during exchanges which makes the protocol more simple. ------------------------------------------------------------------------------ Peer-to-Peer Membership Management for Gossip-Based Protocols Ayalvadi J. Ganesh, Anne-Marie Kermarrec, and Laurent Massoulie One of the essential points in all mulit-cast and gossip-based protocols is how to know which nodes are currently present in the overlay. Having a complete membership information requires use of a centralized approach or results in a non-scalable decentralized algorithm. The main motivation behind this paper is that we don't need a full membership information to have a gossip-based protocol working. Rather, we only need to have partial views of each node to be updated according to the real overlay membership even if no one knows about this full membership. The approach proposed in the paper tries to keep the partial view of each node a constant times the logarithm of network size. This is enough to guarantee the upper layer gossip-based protocol works efficiently. The partial view update is done through the following mechanism: when a node subscribes, a subscription request is forwarded to the nodes in the partial view of contact point. Each node receiving the subscription request, will update its partial view with a probability inversely related to its current size of partial view. During unsubscribe, protocol uses another list to inform the nodes with unsubscribing node in their partial view. Using above mechanism, authors show that the desired partial view size is acquired if the subscriptions are uniformly distributed of the network and are independent. The paper then relaxes the assumption of having uniform subscription and provides a balancing mechanism to address this issue. Two mechanisms, namely indirection and lease are proposed to work in conjunction with each other. Next, authors evaluate their protocol through simulation. First, they verify that the partial view size is indeed the logarithm of network size. Then, they show that the upper layer gossip-based protocol can perform almost as efficient as having full membership information. Next, the effect of several parameters over the reliability of the gossip-based protocol has been studied. The paper is very well motivated and the approach is very interesting. Authors, design their protocol based on the analysis of their objective which works very well. However, when proposing the rebalancing techniques it seems that the nodes require a full membership information to forward the subscription request which is in contradiction to the main point of the protocol. On the other hand, the scalability and message complexity of this protocol has not been investigated in the evaluation section. ============================================================================== From: Alejandro Gutierrez [agutie01@gmail.com] Sent: Tuesday, March 25, 2008 8:58 AM To: indy@cs.uiuc.edu Subject: 525 review 03/25 =============================================================== "Peer-to-peer membership management for gossip-based protocols" Ayalvadi J. Ganesh, Anne Marie Kermarrec, Laurent Massoulié Reviewed by: ALEJANDRO GUTIERREZ =============================================================== This paper from the UK Microsoft Research lab presents a peer to peer membership protocol with the characteristic that is does not assume global knowledge and provides each member with a partial view of the group membership. It also has the benefit of retaining some desirable properties of gossip-based protocols such as reliability and scalability. As I mentioned previously the proposed protocol requires each node to have a partial view of the entire system and the size of this view converges to log(s) which is required to achieve reliability in gossip-based protocols. The protocol also presents additional mechanisms such as indirection and lease mechanism to achieve load-balancing even in the worst case scenarios. This paper is well organized, including a well thought motivation, an accurate problem definition, a clearly defined approach, as well as some significant analysis and a useful simulation. One of the most interesting parts of this paper is the one where they describe how the size of the local view of each node converges to the optimum required for reliability. The authors of this paper defend their claim by providing an extensive simulation as well as a deep analysis on their behalf. One of the problems I think the authors should consider is taking into account the locality of the nodes during the process of subscription. On the other hand I don´t understand how the nodes would locate an arbitrary member in a dynamic and real world scenario such as the Internet. =============================================================== "T-man: Fast Gossip-based Construction of Large-Scale Overlay Topologies" Márk Jelasity and Ozalp Babaoglu Reviewed by: ALEJANDRO GUTIERREZ =============================================================== This is a technical report from the Computer Science Department at the University of Bologna. It presents a gossip-based scheme called T-man for building a large class of topologies in a large scale dynamic fully distributed system. The topology is defined by a single ranking function that ranks nodes according to their increasing distance from any given fixed node. The basic idea behind the protocol is to have the top c ranked nodes in each node’s partial view list based on some distance metric. The protocol considers three topologies (line/ring, mesh/torus, and binary tree) based on the space in which the nodes lie. The authors argue that by using their approach properties such as clustering and sorting can be effectively achieved. The simulation they provide shows the reader that initially the convergence takes places faster but tapers off significantly in the last part. Therefore the authors present two strategies to counterbalance the latter result. This paper has a well defined analysis of an extensive simulation. The authors generate a wide range of topologies that converge relatively fast, thus achieving in an effective manner properties such as sorting, clustering, and potentially semantic overlays. I would personally like to see how this protocol behaves with churn as well as fault-tolerance. From: Rahul Malik [rmalik4@uiuc.edu] Sent: Tuesday, March 25, 2008 8:59 AM To: Gupta, Indranil Subject: 525 review 03/25 Peer-to-peer membership management for gossip-based protocols SUMMARY: This is one of the beginning papers in peer-to-peer membership management for gossip-based protocols. The main novelty of the paper over previous work is that it is completely decentralized protocol and the protocol is self-converging. The basic membership protocol runs as follows. A new node joins the group by contacting an arbitrary peer and it forwards its subscription to a fixed number of nodes in its list and with a probability, they integrate it in its list. The size of the partial views at each node is maintained fixed and is log in terms of the number of nodes. The unsubscriptions from the system also preserves the scaling property of list sizes with the system size. Also, in order to recover from the isolation and in order to prevent partitions from happening in the system from node failures or from unsubscriptions, nodes constantly exchange heartbeat messages with each other. Finally, in order to have the subscriptions chosen randomly from the set of nodes, they define two mechanisms, indirection and lease mechanism. The indirection mechanism consists of two parts: forwarding rule and a stopping rule. The system has been evaluated over a large number of nodes and the paper evaluates all the design parameters in detail. PROS: The most fundamental contribution of the paper is that the protocol does not require any centralized operation. No global knowledge is needed to be maintained anywhere in the system. The partial views at each node is always logarithmic in number of nodes. The system exhibits the same degree of reliability as traditional gossip-based scheme which require each member to maintain the list of all group members. CONS: There are many areas in which they can improve their system. They have not implemented their system on some real testbed, so the pattern of node failures is not actually real. Also, they do not discuss issues in peer-to-peer systems such as security, which is essential property. They also do not address the issue of locality, that can be used to choose the members in the view. In their results, they also do not address the issue of overall network load, but are more concerned about load per node. T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies SUMMARY: In this paper, authors have described a gossip based scheme called T-Man for the construction of a large class of topologies. They identify topology management as an abstract service and the topologies are defined by “who knows whom” relation. In this present work, they identify three specific topologies: ring, torus and binary tree. In the protocol that they have defined, each node has an address and maintains a partial view, which is a set of descriptors. So, the topology construction problem is actually modeled as a ranking problem by a suitable ranking function. The relation defines a partial ordering over the nodes in the view. In this paper, the ranking function used is distance function. In the first topology of line and ring, the distance is for a line. In the second topology of mesh and torus, the profiles are two-dimensional real vectors. Finally, in the binary tree, the profiles are binary strings of length m. They have implemented their protocol on a simulator and presented the simulation results over large number of nodes for all the topologies. PROS: Topology construction is a central problem in distributed systems and is needed for a number of reasons. They have defined the topology management as a general purpose function for a number of topologies. This is certainly very important for distributed systems. The defined protocol is scalable and fast. The convergence times grow as logarithm of the network size. So, as a result of this, several topologies can be created on demand. CONS: Some of the drawbacks of this paper are that they assume no node failures, which is very limiting assumption of their protocol in distributed systems. Also, they have used just distance functions for ranking of nodes in the problem. They should also implement others and show how the protocol performs over them. From: Riccardo Crepaldi [crepric@gmail.com] Sent: Tuesday, March 25, 2008 6:09 AM To: Gupta, Indranil Subject: 525 review 03/25 T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies The motivation of the design of T-Man is the importance of an optimal topology in order to improve the efficiency of a large-scale overlay network. If a membership algorithm creates this topology without certain rules, it is possible that neighbors in the overlay are connected by physical paths that for many reasons are characterized by bad performance. T-Man is a lightweight protocol that allows the definitions of a ranking function based on several different parameters, that can be defined by the network administrators depending on the needs of the running applications. In this protocol the membership algorithm is intended as an abstract and independent service, based on a gossip-based scheme. The technical reports shows how it is possible, using different definitions for the ranking function, to create a line, a ring, a torus and a binary tree topology. Additionally, it is shown how T-Man can be applied to sort a set of numbers. The algorithm is shown to converge in approximately a logarithmic time. It scales well, because it is based on the concept of partial views of the network, where each node needs to know only a subset of the nodes in the network. The advantages of this protocol are for sure its lightweight and scalability. The short convergence time is also very important. The freedom that is left to the user in defining the ranking function allows the protocol to be very versatile. However, some of the metrics that are taken into account could be time- variant, is it possible that the protocol never converge and instead generates a topology that is too dynamic, if these metrics are considered in the ranking function? And how will node failures be managed? = ======================================================================== Peer-to-peer membership management for gossip-based protocols SCAMP is another gossip-based protocol that address the problem of scalability. In earlier gossip-based protocols, each node had to know all the nodes in its group, and then to select randomly which nodes to forward a message to. This approach limit scalability for sure. On the other hand, a partial view that is too small compared to the dimensions of the network will achieve very low performance in reliability. SCAMP is able to increase the partial view dimension when the system size increase. However, since each partial view has a size that is (c log(n)), the network that the protocol can serve can be very large. The subscription and unsubscription mechanisms allow nodes to be always up-to date about new nodes and nodes leaving the system. The protocol is shown to deal very well with node failures, providing performance close to those of a standard gossiping algorithm even when more than half nodes are failing. The problem of the non-random choice of initial node contacted by the join procedure is also addressed. The major contribution, other than the scalability of this solution, obviously, is the automatic growing of the partial view dimensions when the network increase. The most important drawback is that membership is completely random, ignoring any information about bandwidth, RTT or other important network parameters. The effectiveness of the solution in a geographically large network could be negatively influenced by this. From: dkassa2@uiuc.edu Sent: Tuesday, March 25, 2008 4:56 AM To: indy@cs.uiuc.edu Subject: 525 review 03/25 ====================================================================================================== Review 13 Paper Title: T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies The paper proposes a gossip-based scheme called T-MAN for the construction of a large class of different topologies. In the gossip-based probabilistic approach for the topology construction problem the topology is defined by a single ranking function that ranks nodes according to increasing distance from any given fixed node. First the views of the nodes such that for a node x the view of x view_x contains exactly the first c elements of a possible ranking of the entire node set are constructed. When the ranking parameter d defines a metric space on the set of nodes, the ranking function can simply order the given set according to the distance from the base node. This way each node maintains addresses of other nodes through the partial view, which is a set of c node descriptors. This set is produced from the merged and ranked views of the neighbors (peers). Using some experiments and an approximate analytical model T-Man is shown to find the vast majority of the desired links in logarithmic time. The paper also points out that the convergence time depends on the target topology itself. The paper gives detailed explanation of the proposed protocol. However I have a problem with the assumption of the paper that there exists a single synchronization point when the protocol is started at all nodes. Besides if there is a set of nodes connected through a routed network where the T-Man algorithms starts, I have a problem on whether or not T-Man is trying to construct a topology based on an existing routed topology. Review 14 Paper Title: Peer-to-peer membership management for gossip-based protocols A Scalable Membership protocol (SCAMP ) which is a peer-to-peer membership management protocol that provides each member with a partial view of the group membership is presented. Unlike previous works SCAMP is self-organizing and provides fully decentralized membership management with the properties to achieve gossip-based multicast with high reliability. SCAMP provides each member of the group with a partial view, that is a list of identities of other group members. This forms the basis for broadcasting messages across the group, by enabling each member to propagate messages to all or to a subset of those members whose identities are in its own list. SCAMP is based on the design goals of scalability where the partial view at each node grows slowly, reliability, decentralized operation and isolation recovery. In SCAMP each node maintains partial, yet sufficient system view and view size in each member can change when system size changes. Besides any isolated node can rejoin the system automatically with isolation recovery mechanism. The paper gives detailed explanation of the algorithm and related literature. SCAMP is validated using theoretical analysis and a detailed simulation. I have no major criticism against this paper. ================================================================================================== From: Justin Tulloss [jmtulloss@gmail.com] Sent: Tuesday, March 25, 2008 2:00 AM To: Gupta, Indranil Subject: 525 review 03/25 CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays This paper detailed a very simple shuffling protocol for maintaining a distributed graph of neighbors. Basically, a member would randomly send a subset of its known neighbors to a random neighbor on every cycle. The enhanced protocol would place time restrictions on pointers to known hosts to ensure that pointers to hosts who had left the network expired in a timely fashion. The paper mostly dealt not with the specifics of the protocol, but with the performance characteristics of a simulated implementation of the protocol. The authors tested a large number of different variables for convergence coefficients and robustness. They found, unsurprisingly, that larger caches led to lower clustering coefficients and shorter average paths. Pros: Simple protocol Well tested with a large number of variables Demonstrated superiority to other protocols Cons: Too long. Many of the results were unsurprising, they could have proven their point more concisely. T-MAN: Fast Gossip-based Construction of Large-Scale Overlay Topologies This paper concerned itself with describing a large network topology. It did so by creating an initial view through some external protocol and then cyclically sending its view to a random but ranked peer and merging the two. In this way, every node will eventually have a complete and correct view of its network topology. This algorithm converges very quickly, an outstanding topology can normally be acquired within 15 cycles. The authors show a good amount of data relating the size of the view and the rate of convergence. Pros: Simple, accurate, fast Cons: Without a membership protocol, I'm not sure why I should care. This paper assumes ridiculously ideal operating environments; nearly all of their findings would be jeopardized by a practical environment. From: Qiyan Wang [qwang26@uiuc.edu] Sent: Tuesday, March 25, 2008 1:14 AM To: Gupta, Indranil Subject: Review: CS525 by 3/25 Peer-to-peer membership management for gossip-based protocols This paper is an improved version of gossip-based group communication. Authors propose a fully decentrialized membership management scheme, where each member only holds a partial view of the group membership. The protocol is self-organizing and also achieve balanced view sizes even with highly unbalanced subscription patterns. + Partial vew of group membership, small size, and changes dynamically + Balanced membership management - how about implicit unsubscription? T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies They argument the topology management, and propose a fast, scalable and accurate gossip-styled topology management scheme, called T-Man. They also present a common framework for this problem. T-man is designed for O(log n) converge time, and independence of topologies of distributed systems. Thay also discuess some practical issues, such as dynamics and synchronization. + Converge quickly O(log n) time + Fit with any topologies + Cool figure - synchronization needs fast broadcast, hard to meet in practise - any topologies, really? From: Mirko Montanari [mirko.montanari@gmail.com] on behalf of Mirko Montanari [mmontan2@uiuc.edu] Sent: Monday, March 24, 2008 10:59 PM To: Gupta, Indranil Subject: 525 review 03/25 03/25 - CS 525 - Mirko Montanari T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies Reviewer: Mirko Montanari This paper, a technical report from the University of Bologna, presents a gossip-based algorithm for the creation of topologies in wired networks. An interesting aspect of the proposed method is the fact that it allows the application to define its concept of topology. Topologies are defined with the use of a ranking function: a function that, given a node, creates a partial order on the other nodes. With the use of this concept of ranking function the authors shows how it is possible to define of line, ring, torus and binary tree topologies. The protocol is fairly simple: each node maintains a partial view containing c of the nodes in the system. Each view gets initialized with a random sample of nodes. The view gets ordered according to the ranking function and a random peer gets selected from the upper-half of the list. These nodes exchanges their views, obtaining a list with size 2*c. This list gets ordered and only the first c nodes are maintained in the new view. PROS: + The proposed algorithm is simple and it provides a fast convergence toward the desired topology + The definition of the ranking function is very broad and can be used to to create many topology (not only distance based) CONS: - The authors don’t consider ranking function that changes over time. Distances such as RTT change over time, and it would be interesting to see how the algorithm behaves. Maybe it is possible to use the techniques that the authors propose to speed-up the final convergence phase of the algorithm, but this problem is not discussed in the paper. - There is no analysis of the resilience to failure in the topology that is formed. An analysis in presence of failure would be interesting to see. - The size c of the partial view needs to be chosen at design time. ---- Peer-to-peer Membership Management for Gossip-Based Protocols Reviewer: Mirko Montanari This paper proposes a membership-management protocol that can be used as a support to gossip-based information dissemination protocols. Gossip-based dissemination protocols require nodes to randomly pick other nodes in the system to communicate the message. In order to guarantee randomness in this selection, these protocols usually require a complete knowledge of the member of the group. This paper proposes a new mechanism for the management of the view in these application. This technique requires each node to know only a subset of the other nodes and it still is able to provide the reliability of protocols that have a complete knowledge of group members. The proposed protocol works by introducing a different subscription / unsubscription mechanism: when a node A joins the system, it contacts another randomly picked node B. B sends a message containing information about the join of A to each of its neighbors, and it also creates other c join messages to send again to randomly chosen neighbors. Each node that receives a subscription keeps it with probability p, otherwise it forwards it to one of its neighbors. The probability p of keeping a subscription decreases with the size of the view. This mechanism, in conjunction with a mechanism for unsubscriptions and other improvement designed to deal with non-randomness in the choice of the starting node and with the failure of nodes, is able to obtain a membership protocol that requires each node to maintain only a partial view of the system that grows automatically according to the size of the system. PROS: + Really interesting the property that the size of the view changes automatically with the size of the system. + The authors give ways to deal with the problem of non-random initial nodes that are contacted when a new node joins the system. CONS: - The choice of the neighbors node is random and does not take into account locality. It would be interesting being able to obtain similar properties with the selection of a partial view that prefers nodes “close” to the originating node. From: marefin2@uiuc.edu Sent: Monday, March 24, 2008 10:02 PM To: Gupta, Indranil Subject: 525 review 03/25 Peer-to-Peer Memebership Management for Gossip-Based Protocols A. J. Ganesh, A. Kermarrec and L. Massoulie This paper presents the design, theoretical analysis and evaluation of SCAMP, a membership protocol for gossip-based event dissemination. The noble idea here is to provide gossip-based membership dissemination by keeping the partial view of the system without knowing the actual system size. In this protocol, each node maintains a PartialView set containing the nodes it sends message to and a InView set containing the nodes that it receives gossip messages from. This protocol tries to keep the size of PartialView set equal to (c+1)log(n), where c is the system parameter. During the subscription, when a new node comes, it sends subscription request to a random node in the system called the contact node. Contact node forwards this new node-id to all the nodes in its partial view and send c more copy to randomly selected c nodes of its partial view. This is done to keep the size of PartialView near to (c+1)log(n) even if a new node joins. To select the contact node randomly, each arc (i,j) in PartialView and InView is assigned a probabilistic weight wi,j. So, when a node receives a subscription request, it assigns a counter (proportional to PartialView of ni) to it (only for the initial contact node) and forwards the request to j with probability Pi,j = wi,j/( ) along with 1 decreased value of the counter. When a node receives the forward subscription message with counter 0, then it will act as the contact node for the basic SCAMP subscription protocol. When a node (n0) wants to unsubscribe, it tells sequential l-c-1 (when l is the size of InView) nodes from its InView to replace n0 from their PartialView with the node id of sequential l-c-1 nodes from the PartialView of n0. The remaining (c+1) nodes in InView of n0 simply remove n0 from their PartialView. This is done to keep the partial view size nearly (c+1)log(n) even with any number of unsubscription. To recovery from isolation and failure, each node periodically sends heartbeat message to all the nodes in its PartialView. As SCAMP supports dynamic nature of nodes availability, subscription of a node expires after a predefined time and that node needs to re-subscribe again to the system with the same PartialView by sending subscription request to any randomly selected member from its PartialView. It provides a mechanism for coping with nodes, which either suffer crash failure or leave the group without unsubscribing. Some of the mentionable points about this protocol: · SCAMP does not require any centralized operation and global knowledge to maintain the system. As nodes join and leave the system, PartialView size scales automatically in proportion to the logarithm of the number of members in the group. · SCAMP shows nearly the same degree of reliability as traditional gossip based approach even with the presence of large failure. The authors show the comparison for 0% to 70% of failure in the paper. · SCAMP provides scalability in the sense that there is no need to store the whole system view and it doesn’t depend on system size. The size of PartialView grows slowly with the increase of group size. · The protocol doesn’t clearly explain why do we need to send weight update to each node and how does it effect the randomness, as each node periodically calculates the weight of each arc in PartialView and InView. It is also not clear, what will happen if several nodes try to send weight update to each other. · A figure representing the network load could be a good source of information for the application domain of this protocol. This protocol is not concerned about the locality information of the nodes. Also the authors mention to work on the message propagation delay as their future work. T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies Mark Jelasity and Ozlap Babaogla This paper proposes a gossip-based protocol for constructing a general class of topologies, called T-MAN. This topology construction problem can be modeled as a collection of N nodes with view of size c and a ranking function R. Ranking function simply orders the given set of nodes according to the distance from a base node. The task is to construct the views of the nodes such that for a node x the view of x viewx contains exactly the first c elements of a possible ranking of the entire node set, that is, R(x, {all nodes except x}) contains a ranking that starts with the elements of viewx. Alternatively, there exists no node y outside viewx such that y ranks strictly lower than any elements from viewx. T-MAN uses two phases for topology construction. The first phase is the fast convergence phase. Each node maintains two threads, active and passive threads. The active thread runs at a random time once in each consecutive interval of T time unit. It randomly selects a peer from the view list and sends the view along with its own descriptor to it. The passive thread runs forever and waits for the incoming message. When it receives a view, it merges it with its own view. Thus ideally, the new view can be described as the closest c elements of random 2c samples. Using the inductive heuristics, after cycle i, each node contains the closest c elements of 2ic random samples. It can be easily proved that the convergence time i shows the logarithmic bound (< log2(N-1) – log2c ). But if a node communicates too many times or too few times, it will converge faster or slower respectively, and therefore it will provide less useful nodes to others when contacted, having a relatively more biased view. The second source of error is related to the fact that even if assuming that the views at all nodes follow the model in some cycle i, the closest c nodes the model refers to are closest to the node holding the view. Thus at the end of fast convergence phase, there may be some nodes left behind in convergence and they use already converged structure to climb gradually to their position and that convergence time is solely depends on the topology. This phase is called endgame. The authors further modify the T-MAN protocol by restricting the number of contacts to each node to a fixed number. Also the optimized endgame version of the protocol changes method SELECTPEER so that it assigns exponentially decreasing sampling probabilities to the neighbors according to increasing rank. It improves the performance of the protocol. The followings are some notable points of this paper: · A wide range of topology can be constructed and maintained in a fast and scalable manner. The convergence time of this protocol is logarithmic depending on the target topology. · T-MAN can be used in large distributed system to construct very different topologies or jumpstart other protocol rapidly on demand in a flexible and adaptive manner. · It is not clear from the paper, how a node can identify whether it has converged or not, at the end of the fast convergence phase. · This protocol needs an efficient and fast broadcast support for synchronization and dynamism. Also this protocol assumes efficient communication link, which may not be true in real scenario. · Also I am not sure how to select the base node and how to handle node joining and failure in that case. Arefin From: Hengzhi Zhong [hzhong@uiuc.edu] Sent: Monday, March 24, 2008 9:10 PM To: Gupta, Indranil Subject: 525 review 03/25 T-Man: Fast Gossip-based Construction of Large-Scale Overlay Topologies Summary: In this paper, it views topology management as a general purpose function that is desirable in large distributed systems. The algorithms for topology management must be fast, accurate, and scalable. To do so, the paper introduces a topology construction problem for topology management. It uses a gossip-based construction T-Man for building the topology. Each node maintains addresses of other nodes through a view, which is a set of node descriptors. The size of descriptions is the same for all nodes. Each node descriptor contains a profile. The profile is basically a description on how to define a certain type of topology (value ranges, distance function). The distance functions are defined for topologies such as line, ring, mesh, torus, and binary tree. The topology is defined by a single ranking function, which ranks nodes according to distance from any fixed node. T-Man is scalable and fast, and converges logarithmically to the network size. Pros: 1. certain topologies can be created on demand 2. convergence time is pretty quick Cons: 1. The graphs aren’t very clear and they aren’t explained well. 2. This algorithm doesn’t work well with certain topologies. If the distance function is the round trip time, then the algorithm for finding nodes that are close to the node may not work anymore. 3. This algorithm gossips locally. Is using a local algorithm optimal and accurate? 4. How are the fixed nodes selected? Peer-to-Peer membership management for gossip-based protocols Summary: The typical gossip-based protocols for group communications assume that each group member has the full knowledge of every member in the group and chooses members randomly to gossip. However, this assumption makes the protocols unscalable in large-scale groups. Hence, this paper proposes that each member maintains a partial view of the group membership instead of a global view. The size of the partial views grows logarithmically, but eventually converges. The view sizes are also made balanced so there won’t be members that have a huge view while some have a small view. Each member generates a message and sends it to a random subset of other nodes. When any node receives a message, it does the same. New nodes join in by sending a subscription request. When a node receives this request, it forwards the new node-id to all members of its own local view. It also creates additional copies of the subscription and forwards them to the nodes in its local view. There are two ways to balance the partial views. One depends on randomized way to pick gossip targets. The other way is lease, which basically redistribute the nodes in the partial views every certain time interval. Pros: 1. each member maintains a balanced partial view of membership Cons: 1. To make sure the protocol doesn’t forward an infinite number of times, there is a cap on the number of times for forwarding a subscription request. How will this cap limit performance? What are the trade-offs? 2. heartbeat mechanism is used for recovery from isolation. But heartbeat isn’t very efficient and scalable. So, how often does isolation occur in the system that it’s okay to use heartbeat? From: ysarwar@gmail.com on behalf of Yusuf Sarwar [mduddin2@uiuc.edu] Sent: Monday, March 24, 2008 8:42 PM To: Gupta, Indranil Subject: 525 review 03/25 Paper: CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays Review: CYCLON is a group membership protocol proposed for maintaining overlay structure in a large distributed system. It is based on enhanced shuffling mechanism where nodes periodically exchange their neighbor list with other neighbors. The basic shuffling algorithm is quite simple. Each node keeps a local view of the system, called neighbor list or cache, a set of nodes this node knows of. To adapt network dynamics, every node picks a subset of its neighbors at random and shuffles their positions in the graph along with a random subset of their earlier neighbors. By this shuffling, membership changes among nodes and the network connectivity changes with time. The authors bring little change in the basic scheme. Instead of choosing shuffling node randomly from the neighbor list, each node picks the neighbor (along with others chosen randomly) about which it came to know of most lately, that is, the oldest node becomes most preferable candidate for shuffling. This change to the scheme gives more robustness to failure detection, because now, as time passes, older nodes become more prone to be picked for shuffling and eventually will be detected as dead, if they have been so. When a new node joins, it contacts a currently alive node, called introducer. The introducer initiates a number of random walk declaring the presence of this new node by sending a set of TTL bound messages. Whichever node receives this join message with TTL=0, it treats this new node as its neighbor and it itself becomes a neighbor of that new node. The authors run simulations to analyze the behavior of their approach for several metrics. Pros: - Simple idea, works fine, reduced message complexity. - Eliminates all heartbeat like message passing for failure detection technique. The protocol itself takes care of it in its usual operation. - If the network gets to be connected at any instance of time, it is shown that the network will eventually remain connected thereafter, since shuffling does not disconnect a connected network. - The membership size closely resembles to the in-degree size of a random graph. Cons: - How is the value of c, the cache size, chosen? What should its optimal value be? - Why is the average path length from a node treated as a performance metric of a membership protocol? It's intuitive that membership size (cache size), connectivity level, node coverage could be more important aspects. ======================== Paper: Peer-to-peer membership management for gossip-based protocols Review: In earlier proposed gossip based membership protocols, nodes are required to know of all other nodes in the system, and when a node tries to gossip in order to disseminate a certain information to the entire network, it selects one of them at random. But for a large system, maintaining the entire knowledge is not scalable any way. In this paper, the authors propose a membership protocol that does not need the complete knowledge of the network, just a partial view of it. SCAMP (scalable membership protocol) maintains a partial view for each node, and these partial views satisfy the connectivity of the network. SCAMP supports gossip based multi-cast with strong atomicity. SCAMP has subscribe (join) and unsubscribe (leave) mechanism for nodes. When a node joins, it contacts an existing node, called contact. The contact forwards this join message to all of the member of its partial view and also creates some additional copies of this join message and forwards them to randomly chosen member in partial view. When a node receives a forwarded join message, it decides with some probability either to forward this or hold the new node in its partial view. For balancing the graph, a random walk is also proposed while selecting the contact. When unsubscribing, the leaving node exchanges its partial view with its partial view members. The protocol uses usual heartbeat messages to detect isolation. When a good number of nodes in its partial view are detected as dead, a node initiates a subscription procedure as if it joined the system. Pros: - Gossip based group membership protocol. - Keeps partial view size small, in order of log(n).. Cons: - It hasn't been shown, if a multicast is made, how many nodes (or what fraction of nodes) are eventually reached? - It is not obvious how the impact of 'indirection' is justified. It seems too costly for getting small improvement. - The authors consider formal unsubscription of nodes, but nodes can abruptly leave the system without any explicit notification to partial and In view members. In that case, partial view update can't be possible. ===================== From: Anthony Cozzie [acozzie@gmail.com] Sent: Monday, March 24, 2008 8:11 PM To: Gupta, Indranil Subject: 525 Review 3/24 Cyclon It is difficult to imagine a more simplistic algorithm (I like simple algorithms, since they make for short reviews). Cyclon nodes simply gossip about their membership, with a few heuristics (nodes swap instead of a pure gossip, and prefer to gossip about "older" information). The paper has a mass of graphs. Really I don't think it's possible to comment much on this paper: the authors have already analyzed their idea to death. T-Man It took me some time to figure out what the T-man protocol was trying to accomplish. I chalk this up to poor writing, and I also do not appreciate the thousands of lines in the network diagrams that slowed the poorly written Adobe Acrobat plogin to a crawl. T-Man attempts to organize nodes into a topology, when that topology is given. Under normal circumstances, a node would simply compute its neighbors as specified by the topology, and everyone would be as happy as a pig in mud. However, T-Man assumes that each node has a partial view of the system. This means that each node cannot just select its perfect neighbors. A very simple solution would be to select the "best" known set out of the neighbors known, and gossip about new neighbors. Neighborhood gossip will work extremely well here, since the topologies selected are bidirectional and a node will tend to "walk" closer to its optimal direction and reach it much more quickly than O(N). Some thoughts: does this work with any topology? I'm guessing not. From: emenese2@uiuc.edu Sent: Monday, March 24, 2008 8:59 AM To: Gupta, Indranil Subject: 525 review 03/25 Paper: Peer-to-peer membership management for gossip-based protocols Reviewer: Esteban Meneses (emenese2@uiuc.edu) This paper presents an alternative method to the full membership protocol. Instead of every node keeping a complete list of every other node in the overlay, the authors claim that having a partial view is enough to provide strong reliability requirements. Moreover, as the size of the group scales, the length of this partial view only increases logarithmically according to the group size. Now, because the IP multicast is not widely deployed, then an application level multicast turns out to be valuable for many systems. Gossip-based systems have been an interesting solution to that problem. In the general case, a gossip-based algorithm works in the following way: every node upon receiving a message selects a random subset of all the known nodes to forward the message. In this fashion, there is a high probability that all nodes receive the message sent. However, these protocols are based on the fact that all members must have a complete list of the whole group, but this is really difficult to maintain, specially in a huge group. One of the main contributions of this paper is how to compute the fanout (which is defined as the number of members selected as gossip targets from the partial view in order to provide high reliability). The protocol using this methodology has the following properties: scalable (the size of the partial view must increase slowly according to the group size), reliable (the partial views must provide a reliability compared to traditional schemes), decentralized (the partial views work by subscribing and unsubscribing members) and isolation recovery. To build an overlay with partial views, a node must first send a subscription request to a contact. The starting partial view consists of only the contact node. As the system grows, more nodes will be added into the view. When a node receives a subscription request, then it sends the request to all nodes in its partial view. Upon receiving that request the nodes can update its partial view or forward the request to other node. This is performed based on a system parameter called c, which is the redundant factor that provides a reliability guarantee. Each node keeps two lists: the partial view (nodes is sends gossip messages to) and the inview (nodes that it receives messages from). A couple of optimizations over this initial algorithm consists in introducing the indirection, which works when a node receives a subscription and it decides probabilistically to forward that request. Also, a lease mechanism is provided to create a more balanced topology in the sense that the partial view will be changing over time. The drawbacks of this proposal appear when we consider some of the probabilistic decisions in the protocol. First, a message can be forwarded infinitely around the system, particularly a subscription request. To solve this it is necessary to assign a maximum forward length, but to discover this value at the same time it is necessary to know about the system properties. Second, there is a chance of unbalancing in the graph creation. Third, there exists a chance for a node to be isolated from the rest of the group. In this last case, a heartbeat mechanism is introduced. On the other side, the protocol is resilient to node failures and packet loss in the underlying network. This paper proposes an interesting point about having only partial information in the nodes instead of a complete view of the group. Even with this limitation, the system is able to provide strong reliability properties. The same happens in a market economy when the information is only partial for many of the participants and the price offered by one participant is proposed on what it knows about the other participants. The idea of information being only partial makes the algorithm more distributed in the sense that there is no idea of total (central) information. The paper introduces an important idea, but locality is not an issue for this protocol. There are no measures about how this limitation can help or hurt the algorithm in the network layer. Paper: CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays Reviewer: Esteban Meneses (emenese2@uiuc.edu) This paper introduces a system to construct overlays with several desirable properties. The authors claim that by using gossip mechanisms they can provide a reliable system that supports a high churn in the network. The basic idea strives in the concept of shuffling that keeps updated an overlay by an epidemic algorithm. Every node has a small dynamic set of nodes called neighbors. Occasionally, one node contacts other node in order to exchange some of the neighbors. The shuffle consists in selecting a subset of nodes form the neighbors list and exchange neighbors list with them. In the CYCLON system, the authors improved the basic algorithm by not selecting a neighbor randomly, but choosing the neighbor whose information was the earliest one to have been injected in the network. The main idea with this is to limit the time a pointer can be passed around until it is chosen by some node for a cache exchange and to impose a predictable lifetime of each pointer. With this change in the basic algorithm, they provide several properties. The connectivity, which means that the overlay is always connected. Also, the convergence property, which means that paths between any pair of nodes converge to minimum values. They changed the focus from the “knows about” version of the graph to the “can communicate” approach. On the other hand, they also improved the clustering coefficient, which is a measure on how well connected are the neighbors of a node. Although this is a measure for small-world networks, it provides a measure for how well the graph is connected. Another point in their design was to have a good degree distribution. The degree of a node is defined as the number of links to other nodes. There are several reasons to worry about it. First, the degree distribution is related with the robustness of the system to the failures. Also, it is an indication of how epidemics are spread. Third, it provides a view of how links are distributed among nodes. This distribution must be uniform. The advantages of this system stands in that there is a design issue on how to connect the nodes in the overlay instead of letting just the overlay to grow freely. Thus, the algorithm imposes several desirable properties for providing resilient to failures and optimizing the required steps to spread an epidemic message. However, at the same time, it has the limitation that many more messages are required to maintain such characteristics in the network. It is interesting to be aware that certain “unstructured” networks can have “structural” properties. With this in mind, several other optimizations can be obtained after such properties are provided. However, there is always a trade-off between more control, more flexibility and more efficiency. It is impossible to have all three together. I liked the paper, but I couldn't see any comparison of the proposed approach with previous ones. There is no clear idea if the scheme has a huge overhead or not to keep all the properties in the overlay. It would be valuable to have experiments in that direction. Also, there are no considerations about the locality in the network and how this issue can improve the efficiency in the overlay communication. From: rebolledodaniel@gmail.com on behalf of Daniel Rebolledo Samper [dreboll2@uiuc.edu] Sent: Sunday, March 23, 2008 8:03 PM To: Gupta, Indranil Subject: 525 review 03/25 CYCLON: INEXPENSIVE MEMBERSHIP MANAGEMENT FOR UNSTRUCTURED P2P OVERLAYS The article then presents CYCLON, an enhanced shuffling algorithm that is more efficient and resistant to churn. A shuffling algorithm basically keeps a partial view (called cache in this case) and periodically swaps it with another client and merges the two. The goal of CYCLON is to be highly resilient to churn, and particularly to massive node failures without losing connectivity. The CYCLON scheme differs from other shuffling algorithms in that nodes do not select peers to exchange views with randomly, but instead they select the nodes that have been in their views the longest, and the selected node is deleted from the partial view (cache): the links are reversed to the extent that the receiving node adds the sending node to its cache. Nodes don't update the ages of the nodes in their views when they merge them, but instead they keep the ages they already had, rather than replacing them with those they receive. The authors demonstrate through large-scale simulations that shuffling algorithms yield at equilibrium graphs with similar characteristics as random graphs (average path length and clustering coefficient). However, while the normal shuffling scheme yields the same in-degree variance as a random graph, the enhanced shuffling scheme's variance is lower, therefore distributing links more uniformly across the nodes. They then discuss node addition and deletion. New nodes fill their caches with c nodes located at the end of c random walks of length the (expected) diameter of the network. The random walks are initiated by the new node's introducer, and the selected nodes insert the new node into their views. The authors claim that since the topology is close to random, these random walks yield nodes uniformly at random, and therefore the statistical properties of the network are not modified. The tricky issue, however, is node deletion, and cleaning up the caches in a timely fashion. This is dealt with by allowing nodes to delete cache entries of unresponsive targets. The enhanced scheme has the added benefit that old entries are more likely to have become unresponsive than fresh ones: the authors verify this empirically. Further, the network is quite robust in that it takes relatively few cache entries to prevent (or significantly limit) partitioning. T-MAN: FAST GOSSIP-BASED CONSTRUCTION OF LARGE-SCALE OVERLAY TOPOLOGIES In this paper, the authors present T-Man, a gossip-based algorithm that takes as input a function (latency, inverse bandwidth, etc.) that (partially) ranks nodes relative to a given node (we can think of this ranking as a distance) and builds a distributed system topology that where nodes are close to each other according to this metric. Each node maintains a partial view of the system of size c. The goal of the algorithm is to populate each node's partial view with the c nodes closest to it. Another version may seek to simply pick any c nodes closer than a certain distance, if the ranking function derives from a distance. The algorithm works in rounds: every T seconds, a node will send its entire view to a node selected from the closer half thereof (according to the ranking function). It will then receive the selected node's view and merge it with its own, while keeping only the closest c nodes. The selected node will proceed likewise with the list it receives. The algorithm is guaranteed to converge w.h.p. in logarithmic time. Further, the authors present two optimizations to increase convergence time in the final phase of the algorithm (where it is normally quite slow): balancing, whereby nodes may refuse connections if they've already received too many, hence distributing view exchanges uniformly; and "optimized endgame" where closer nodes are selected more often for view exchanges (following an exponential law). They also present optimizations to ensure connectivity (by introducing the concept of "direction") and random peer selection for bootstrapping (by using a random topology service). The authors propose a conceptual separation between topology construction and the application logic: indeed, the topology only depends on the application insofar as the distance is adapted to the application. This decoupling allows their system to be extremely simple and efficient, but at the same time general enough that it can be used in many different distributed applications. Figure 2 is an absolutely stunning illustration of their results and very effectively drives the point home. However, it raises an important question: what happens if we select a suboptimal ranking function, i.e. one that doesn't reflect the underlying node distribution? The authors claim that their algorithm works even for irregular distributions, but whether this technique can be applied to a real-world scenario remains to be empirically validated. Finally, the authors do not tackle the effect of churn on their system.