From: Rahul Malik [rmalik4@uiuc.edu] Sent: Thursday, February 14, 2008 9:05 AM To: Gupta, Indranil Subject: 525 review 02/14 Submitted by: Rahul Malik Date: 02/14/08 rmalik4@uiuc.edu Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel SUMMARY: This is one of the classical papers in decentralized peer-to-peer research. In this, each node maintains a nodeId, which is randomly selected and is assigned to the node and is unique. The messages with a key are routed to the node with the numerically closest key. While traversing, it takes into account network locality and seeks to minimize the number of hops traveled. For the purpose of routing, it maintains a routing table, leaf set and neighborhood set. Pastry is resilient to node arrival and departure and it can withstand upto |L|/2 adjacent node failures of a node. The authors have performed the analysis of pastry on a simulator and have confirmed that the average number of hops traveled during routing varies as log of number of nodes. PROS: This is certainly one of the beginning papers in peer-to-peer systems. One of the strong points of the paper is that while routing the messages, they consider the physical locality. As a result, it improves the robustness of the system. Another advantage is that it is totally decentralized, scalable and self-organizing, thus not requiring any centralized support. CONS: One of the major drawback of the paper, which I think, is that they have not done studies of their algorithm on a real testbed. This is definitely needed as the kind of assumptions that they have made regarding uniform physical distribution of nodes spatially is not true in real world. Another weakness is that they have not described about the message overhead during nodes joining and leaving the system. They have not discussed anything about how fast or slow their system stabilizes. The churn rate discussed in the analysis is very low and is not representative of real systems. The system is also prone to malicious users and does not address security aspect. Also, it assumes no packet failures, which is not true in real life. FUTURE WORK: The authors should perform a more complete theoretical analysis in the paper to back up their claims made. Also, they should implement and test it on a real situation rather on a simulation framework. They should improve it for more significant churn in the system. I do not think so that their system is capable of handling a significant churn. Kelips : Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, Robbert van Renesse SUMMARY: In this paper, authors have described the design of distributed hash table for peer-to-peer systems. Their main contribution is that their system is they tend to reduce the file lookup times and increase the stability in case of churns by increasing the memory usage and background communication overheads. They maintain k virtual affinity groups with the help of a consistent hash function such as SHA-1. For the nodes in the affinity group of a node, entries such as rtt estimate, heartbeat count, etc. are maintained. Also it keeps a small set of contact nodes for other affinity groups. They use heartbeat protocol to find whether a view, contact or filetuple entry stored is to be deleted or not. Heartbeats are disseminated using a gossip protocol. They have implemented an emulator for the system and results are provided for load balancing, file insertion and fault-tolerance. PROS: As claimed in the paper, their main contribution is that the message lookups are resolved with O(1) time and message complexity. Also, the lookup costs for each query are also O(1). As compared to other distributed hash tables, it does not to maintain some structure such as a ring in Chord or Pastry or routing tables. Thus, it saves additional overhead that one would incur while maintaining those entries. CONS: In the paper, they have not described the scaling of the system with increase number of files with each user. Their assumption is that each node has just one file, which is completely opposite to real life scenarios. As a result of increasing the number of files at each node, memory requirements at all the nodes will also go up. There is a need to address the privacy and security aspect of the system. Also the background communication overhead of the system is higher as compared to other systems. The functioning of central server used for joining is also not clearly mentioned. FUTURE WORK: I think that they should also take into consideration the spatial locations while assigning virtual affinity groups because that will reduce the communication overheads of the system. They should also more explicitly mention the functioning and updates at the central server for joining the system. I think that they should also address privacy and security aspect in future. From: fariba.mahboobe.khan@gmail.com on behalf of Fariba Khan [fkhan2@uiuc.edu] Sent: Thursday, February 14, 2008 8:53 AM To: Gupta, Indranil Subject: 525 review 02/14 Resilient overlay networks , D. Andersen et al, SOSP 2001 Summary: RON is a overlay network architecture on top of AS's that allows the application running on top to recover from network failures better. RON nodes monitor the the quality of paths such as latency, packet drop and available throughput and exchange these metric along route information. This allows RON to recover from failures in seconds rather than minutes. Authors use two Internetwide testbeds (12 node and 16 node) to test the performance of RON. RON uses a single intermediate node to reroute traffic if that provides at least 50% improved bandwidth and they get improved latency in 60% paths. Discussion: I avoided the whole motivation of unstable BGP routing here. Literature after this paper [Jennifer Jennifer Rexford et al, AT&T Labs, 2003] shows that BGP is quite stable at the core. Additionally experiments done with educational / research nodes might not be comparable to a home (ISP) user. The paper has a bigger contribution than trying to fix BGP. It shows us how to move on to application layer and use p2p to get the properties we want out of the network. Properties that might take years of standards and infrastructure update otherwise. Example for 2008: Voip. Excellent evaluation. To the point. I would refer to links as virtual links, the term "link-state" confused me few times. ------------------------------------------------------------------------ --------------------- Kelips, I. Gupta et al, IPTPS 2003 Kelips is a p2p DHT where under normal conditions file lookups are O(1) and membership changes are detected and spread very fast using gossip. These two benefits come at the cost of added memory and bandwidth overhead. Each node uses O(root n) memory. In Kelips each node is in an affinity group by (consistent) hashing its IP into the group number [0 – k-1]. Each node stores 3 sets. 1)view: rtt, heartbeat count (if it has been long may be it is dead) for all the other members in the affinity group. 2) contact: Does the same for few members from each foreign affinity groups. 3) A set of file-IP mapping for all nodes in his affinity. To lookup a file it is mapped to an affinity group using consistent hashing and querying node (q) contacts a node (a) from his list of nodes for that affinity group. A looks up his filetuple for the homeIP (h) of the file. Node q now fetches it directly from h. View, contact and filetuples are updated periodically using gossip. Discussion: Lookup is O(1) not 1 hop. Its constant number of hops. But as log scale very slowly I cannot see the benefit immediately. A related work section could have helped with that. It was not clear to me how Kelip deals with number of affinity groups k = root n changing. It looks like the assumption is that n is fixed (the whole IP space?). But IP's are not quite uniformly distributed. From: rebolledodaniel@gmail.com on behalf of Daniel Rebolledo Samper [dreboll2@uiuc.edu] Sent: Thursday, February 14, 2008 7:30 AM To: Gupta, Indranil Subject: 525 review 02/14 PASTRY: SCALABLE, DECENTRALIZED OBJECT LOCATION AND ROUTING FOR LARGE-SCALE PEER-TO-PEER SYSTEMS This paper presents the foundations of a DHT of nodes with 128-bit identifiers called Pastry. Pastry's main service is routing a message to the node whose identifier is closest to that of the message. To achieve this, each node keeps three structures: the leaf set, a (fixed-size) list of its nearest neighbors in ID-space and their IP addresses; the neighbor set, a list of its nearest neighbors relative to an Internet distance (like hop count) and a routing table. The routing table essentially maps the node and message's longest common prefix and the first b bits in which the message's ID differs from the node's, to a node who not only shares this common prefix with the message's ID, but also shares the following b bits (b is configurable, and the routing table may have empty slots). If a node receives a message not addressed to it, it forwards it to the node in the routing table we just described. If the entry is blank, it forwards it to a closer node (ID-space-wise) that shares the same common prefix (or a longer one). Nodes join the network by a special operation that requires all nodes along the path to its closest node (ID-wise) in the network to send it their routing tables (the initial and end nodes are required to send extra state). New nodes, as well as old nodes, update their state upon joining or when they detect failures. The number of hops required for routing is logarithmic in the number of nodes, as is the state kept by each server. The system they propose differs from Chord in many ways, but most notably in that it tries to use nearby nodes (in the Internet) when there are many nodes to choose from. In a real-world deployment, this would probably improve total latencies when compared to Chord. Another difference is that it is symmetric: it uses the concept of distance in ID-space, whereas Chord uses the asymmetric concept of node following an ID. It is interesting to note that the paper doesn't mention the words DHT, and the paper doesn't describe how to implement one with pastry. For example, they don't deal with how a key-value pair is transferred to new node who is closer to the key than any other node, and indeed they don't talk about key-value pairs at all. However, the paper has some weaknesses. For example, it suggests that keys be replicated in the vicinity of the node closest to it which raises two questions: first, how to synchronize several copies of the same (key, value) pair particularly in the presence of churn; and second, how to justify the apparent inefficiency of having several copies of the same pair in the network. Finally, their analysis of the effect of departing nodes does not take into account the temporal dimension (viz. churn) that is vital in real-world P2P systems. RESILIENT OVERLAY NETWORKS This paper defines an overlay network that allows packets to be routed around the direct internet path or "virtual path" between two clients if it degrades or breaks. The authors point to studies showing that this happens often enough that it is a legitimate problem, mainly because of the slow convergence of routing tables, the main culprit being BGP's aggregation of detailed intra-AS route information. The authors build a system that monitors links between the nodes and routes packets either directly or indirectly based on the current state of the link. Their framework can be used directly in the TCP stack (though in this case processing is outsourced to user space) without changes to regular applications. It can also be used to build higher-reliability applications that define their own notion of link quality (latency, packet loss, throughput, or a combination thereof) and hence influence the path selection process. Empirically, RON can route around most transient failures. This paper argues that the notion of link failure varies from one application to another: FTP can easily handle high latencies but they're a problem for web conferencing applications. It is interesting that they allow the application to integrate very tightly with the network stack, and therefore provide the best service possible in terms of what the application defines as good. Also, they use very clever scatter plots and CDF plots to show RON's performance when compared with the regular internet (particularly figs. 9, 14 and also 13, 15). Nevertheless, they show weaknesses that are not properly quantified in my view: from the graphs it seems that RON corrects abnormally latencies, at a cost of higher latencies in the general case. A useful metric would be, for example, the average increase in latency when RON underperforms (relative to the public Internet) and when it overperforms. Finally, an important problem with RON is its lack of scalability, as the state is essentially linear and the number of messages required for normal operation is quadratic. Consequently, this narrows its possible applications, a point which is not at all discussed in the paper. It would be interesting to see if it is possible to route around link failures with only a small number of "virtual paths" (say, logarithmic in the number of clients). From: Justin King [kingjkk@gmail.com] on behalf of Justin King [king1@uiuc.edu] Sent: Thursday, February 14, 2008 1:22 AM To: Gupta, Indranil Subject: 525 review 02/14 -------------------- Reviews by Justin King ---------- Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems Rowstron, et al. This paper presents Pastry, a fully decentralized overlay for p2p systems. Pastry provides very good (O(log N), with N the size of the p2p system) routing times based on using locality. In addition to good routing times, Pastry spends a significant amount of effort dealing with resiliency and membership churn, achieving impressive results in both areas. As I read the paper, I noted how similar it seemed to Chord. This was confirmed when the authors noted in the Related Work section that Pastry and Chord are “closely related”. However, Pastry makes better guarantees about message delivery time. Chord has better guarantees about routing table growth (which, in fact, doesn’t grow as the network does). However, I believe that given these two choices, Pastry is the better option, at least for home PCs and other machines with large amounts of RAM (and hard disk, if necessary) to store a routing table (a machine with 1GB of RAM to expend on a routing table can store a routing table with 10kB for ALL 100,000 nodes in their evaluation, or 1MB for 1000 nodes. This is clearly feasible, and I believe the tradeoff for faster delivery time is worth it on a PC- class machine). Positives - According to the related work section, Pastry has been used to implement at least two other real systems (papers for which were accepted to SOSP and HotOS), showing that it has real applications. I suspect that Pastry was developed, and its submission was secondary to that of the applications developed for it. However, I cannot be sure of this. Negatives - Pastry overlooks the problem of joining the network: it assumes that a node that wants to join knows the address of another node in the system. However, this cannot be possible in a totally decentralized system where no node is guaranteed to be up at any given time. ---------- Resilient Overlay Networks Andersen, et al This paper presents the concept of a “Resilient Overlay Network” (RON), which provides an application-level overlay on top of the internet infrastructure. Compared to Pastry, which is an overlay specifically targeted at p2p systems, RONs are designed for generic application usage, and employ specific techniques to route around failed nodes. The authors argue that their results, which seem to improve routing over in-network decision-making, indicate that some routing decisions are best left to the endpoints, rather than to the middlemen. Positives - The idea seems to be novel: while the paper admits that overlays are not new, nothing similar to a RON is extant (at least, that the authors mention) - Evaluation section, for the chosen evaluations, was extremely thorough. The authors collected, analyzed, and submitted huge amounts of information to make their case. Negatives - Their evaluation was for two cases: N = 12 and N = 16. I don’t feel that this evaluation gives us any sense of whether their RON implementation can scale to thousands or tens of thousands of nodes, as p2p overlays obviously intend to do. Interesting Note -Looking at their map in Figure 1, I note that none of the nodes in the RON tested are in the middle of the country (“Flyover country”). Though I doubt this has significant impact on their results, I wonder what sorts of reasons there was no involvement with Midwestern universities (for example, UIUC, UW-Madison, Purdue, and UT-Austin, all excellent schools, had no part, nor did any other Midwestern or Southwestern institution, and a single location in Utah represented the Mountain West). -------------------- From: dkassa2@uiuc.edu Sent: Thursday, February 14, 2008 1:11 AM To: indy@cs.uiuc.edu Subject: 525 review 02/14 Review 3: Title: Resilient Overlay Networks A resilient Overlay Network (RON) architecture which allows distributed Internet applications to detect and recover from path outages and periods of degraded performance is presented. RON can do this within several seconds while existing wide-area routing techniques take at least several minutes. RON is implemented as an application layer on top of existing Internet routing substrate. RON nodes use the functioning and quality of the Internet path information among themselves to decide whether to route packets directly over the Internet or by way of other RON nodes by optimizing application-specific routing metrics. A RON router uses the latency-minimizer, the loss-minimizer and TCP throughput-optimizer metrics to evaluate and select routing paths. In general the paper makes a good contribution in an effort to detect and recover from network failures. But I am not in favor of a network over a network unless RON is treated as a back-up part of the whole existing network. As mentioned in the paper RON brings a couple of scalability, security and implementation complexity issues. I believe that a similar failure detection and recovery can be achieved by using clever algorithms and techniques on the existing network with less complexity and overhead. A cross-layer algorithm where transport layer information can be used for routing may for instance be good. ================================================== Review 4: Title: Kelips: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead The paper presents an efficient peer-to-peer (p2p) distributed hash table (DHT) system called Kelpis to allow hosts to join and leave as well as insert and retrieve files. Kelpis reduces file lookup times and increases stability to failures and churn by increasing memory usage and constant background communication overheads. It uses a p2p gossip to partially replicate file index information. Kelpis can resolve file lookups faster and membership changes are detected and disseminated to the system quickly. In Kelpis, query rerouting is used for lookup when there are failures. The paper seems to explain its claim with sufficient details. However, I didn't understand how other p2p DHT systems with increased memory and comparable overheads compare with Kelpis. Apart from this I have no negative comment against the paper. From: qwang26@uiuc.edu Sent: Thursday, February 14, 2008 12:57 AM To: Gupta, Indranil Subject: 525 review-overlay & DHT Kelipes: Building an efficient and stable P2P DHT through increased memory and background overhead This paper explores how to reduce file lookup times and increase stability to failures and churn by increasing memory usage and constant background communication overhead. In previous DHT systems can acheive O(log n) storage at each node with O(log n) lookup cost. In this paper, authors reasonably increase the per-node memory to O(n^{0.5}) to realize O(1) file lookup cost, which is significantly faster than previous schemes. In particular, Kelipes partitions the node group into k subgroups where k varies as O(n^{0.5}), and at each node a O(n^{0.5}) memory of soft state is maintained. To lookup a file, the requesting node locates the subgroup where the file information is stored by using DHT, and then sends a request to the corresponding contact kept in its local contact list. On receiving a request, the node looks up its filetuples and returns the file's location information. Therefore, O(1) fileup cost can be met. In addition, by using epidemic dismission, Kelipes is robust to packet loss and node failures, and possesses very fast file updating. A possible future work is discussing some security issures in Kelipes. For example, authentication no membership, and DoS attack etc. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems This paper discuesses application-level routing and object location in a potentially very large overlay network of nodes connected via the Internet. Pastry is completely decentralized, fault-resilient, scalable and self-organizing. It is designed as general substrate for the construction of a variety of p2p Internet applications like global file sharing, file storage etc. In Pastry, the expected number of routing steps is O(log N), where N is the number of Pastry nodes in the networks. The main idea of Pastry is that, when presented with a message and a numeric key, a Pastry node efficiently routes the message to the node with a nodeID that is numerically closest to the key, among all currently live Pastry nodes. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a scalar proximity metric like the number of IP routing hops. Each Pastry node keeps track of its immediate neighbors in the nodeID space, and notifies applications of new node arrivals, node failures and recoveries. From: Mirko Montanari [mirko.montanari@gmail.com] on behalf of Mirko Montanari [mmontan2@uiuc.edu] Sent: Wednesday, February 13, 2008 11:50 PM To: Gupta, Indranil Subject: 525 review 02/14 Dear Professor Gupta, here there are my reviews of the papers for the CS525 class. Regards, mirko. CS525: Advanced Distributed Systems 02 / 14 / 2008 - Mirko Montanari Review of "Kelip: Building an Efficient and Stable P2P DHT through Increased Memory and Background Overhead" This paper describes a DHT-based p2p system that uses epidemic gossiping to keep up to date the view that each node has of the overlay. The system creates a set of k affinity groups and uses an hash of the pair to associates each node to one of the k groups. Each node maintains a list of the other nodes in the same affinity group. Also, every node keeps a fixed-list of references to a subset of the nodes belonging to each of the other k-1 groups. In order to keep track of the files stored in the system, each node maintains a list of the files available within the group: each list- entry has a pointer to the node that actually maintains the file. All the information about new nodes that join the system, new files that are available and nodes that left the network are disseminated within group and to other groups through gossiping. Queries are performed by hashing the name of the searched file and reducing the value in the range 0 ... k-1. This number is considered the id of the affinity group to contact: given that every node contains the information about the file within the group, the query is able to get an answer. When a new file is inserted in the system, its filename is hashed and the file is moved to the correct affinity group. The paper presents techniques that can be used to deal with node failures. The technique presented in this paper is simple and effective: is able to guarantee constant time file lookup without requiring any rigid infrastructure. Also, the query routing protocol is fast (in normal conditions it requires only one hop) and easy to implement in a real system. Information about the network locality is taken in consideration during query lookup, so the overlay network is not completely disconnected from the underlying characteristic of the actual networking infrastructure. A possible problem of Kelips is the choice of the constant k, i.e. the number of affinity group that compose the network. This work does not present ways to change at runtime the number of affinity group in the network, so this number has to be chosen at design time. The constant k depends on the number of nodes that forms the overlay: having few affinity group increases the number of within group references that each node has to store locally; having many affinity groups in a network with few nodes leads to the possible problem of having empty groups, with big problems in query routing: the algorithm does not consider this possibility. The choice of k might be a problem if the size of the network is not known before hand, and a big k in the beginning of the overlay network like could be problematic. Also, there is no estimation of the amount of bandwidth that is used by the system to keep the state current. The used bandwidth is probably a function of the churn rate, but having a relation between the two would be interesting. Having such relation would allow us to determine, given a maximum background bandwidth usage, which is the maximum churn rate supported by the system. ----------------------------------- Review of "Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems" This work presents the joining and query routing protocols in the Pastry p2p system. In Pastry, every node has a number id, locally computed with the use of an hash function, which ranges from 0 to 2^128-1. Without going into the details of the protocol, routing of messages through the network is performed by considering the ids as a sequence of digits with base 2^b (with b being a configuration parameter): every node keeps a table of with 2^b-1 entries, and each entry n in the table contains the reference to a node which shared the first n digits with the current node. When a message has to be forward to a specific node D (destination), the current node forwards the message to the node which entry in the routing table shares more prefix digits with the destination node D. This assured that D can be reached in about log_{2^b} N steps. The paper offers a detailed description of the protocol, also analyzing cases of failures. Also, the programming interface that the Pastry system exports to the applications is simple and well defined. The paper discusses possible solutions to the problem of having malicious nodes in the network, even if it does not provide results on the effectiveness of the proposed techniques. This work also analyzes the issue of network locality. Pastry proposes techniques for improving the locality of routing through a guided choice on which node a message is forward to. The emulation of the system seems to prove that these techniques are effective, but the experiments are run with an assumptions of euclidean distance between nodes, that is not completely realistic. it would be interesting to see an emulation with a real network topology. The choice of the configuration parameters of the network does not carry any justification: 4 has been proposed as a typical parameter for the constant b, but no relation with the number of hosts that compose the overlay is given. The experimental section does not analyze variation of the configuration parameters: only in one graph, where a network of only 10000 nodes is used, the parameter is reduced to 3. It would be interesting to see how the choice of the parameter affects the performance of the network in function of the number of hosts that are connected. This parameter seems to be important for the overall performance of the system and, given that sees not possible to change it at runtime, it would be interesting to have a method to choose it at design time. From: Zixia Huang [zhuang21@uiuc.edu] Sent: Wednesday, February 13, 2008 11:12 PM To: indy@cs.uiuc.edu Cc: Huang, Zixia Subject: 525 review 02/14 Paper Title: Resilient Overlay Networks Author: David Anderson, Hari Balakrishnan, Frans Kaashoek and Robert Morris Summary: The main contribution of this paper is to describe a resilient overlay network (RON) which allows distributed systems to detect and recover from path outages and improve current routing protocols. The goal of RON is to enable nodes to communicate information when problems happen, integrate routing and path selection, as well as provide a framework for the implementation of expressive routing policies. The paper also discusses implementation topics including the IP forwarder, routers, virtual links monitor. At the end of the paper, a detailed evaluation methodology is proposed. Pros: (1) Clearly defines the goals of RON and illustrate the goals in several paragraphs. (2) Discuss some potential problems and criticisms in section 7. Cons: (1) It is an application overlay, which may introduce additional packet length overhead. (2) Not clearly describe how the overlay network behaves when a link behaves somewhat unstable situations (like high jitters). I guess NOT really improve the performance compared to the original BGP. Paper Title: Kelips: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead Author: Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, Robert Renesse Summary: The main contribution of this paper is to introduce a new system Kelips which can reduce file lookup times and increase the stability to failures and churn at the expense of increased memory usage and communication overheads. In Gnutella and Napster system, some nodes connected to high latency, low bandwidth links will pose a problem for the total cost of lookup. The authors claim that their Kelips system uses O(n^.5) space per node which can resolve lookups with O(1) time. This paper illustrates the background overhead when existing entries are refreshed through a heartbeating mechanism and updates are disseminated through gossip-style protocol. It also discusses the time and message complexity of file lookup and insertion. Several auxiliary protocols and algorithms are introduced briefly int the paper. Pros: (1) O(1) file lookup is achieved and stability at the presense of failure and churns is improved. Cons: (1) May need more paragraphs for section3, auxiliary protocols and algorithms though I understand the total number of this paper is limited. From: marefin2@uiuc.edu Sent: Wednesday, February 13, 2008 11:07 PM To: Gupta, Indranil Subject: 525 review 02/14 KELIPS: BUILDING AN EFFICIENT AND STABLE P2P DHT THROUGH INCREASED MEMORY AND BACKGROUND OVERHEAD Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, Robbert Van Renesse Kelips is a loosely structured p2p DHT. The work shows better performance than other previous academic DHTs like Chord and Pastry. The nodes are divided into k affinity groups. Each node is assigned an identifier from 0 to k-1 using the hashing (SHA-1) on its own ip address and port number. SHA-1 ensures with high probability that the number of nodes in each affinity group is around (n/k) where n is the number of nodes and k is the number of affinity group. Each node maintains an affinity group view, a list of nodes lying in the same affinity group with the heartbeat counts, RTT etc. It also contains Contacts, a small set of nodes in other affinity groups and a Filetuple table that stores all the files in this affinity group with their homenode information. Nodes within the same affinity groups talk to each other using specially weighted gossip protocol. This protocol selects the gossip target with probability proportional to (1/rttr). The cost of it is O(log2n). Also Kelips uses gossip protocol to disseminate information from one affinity group to another. Each of such messages contains limited number of changed filetuples and membership entries. This limit doesn’t vary with n. Now when a file is inserted, first the file is transferred to any node in the proper affinity group depending on its hash value. That node then sends this file to any random node uniformly selected in the same affinity group with the insert request and a new filetuple entry is created and also inserted into the gossip stream. The node that gets the insert request, actually stores the file. So, when a lookup is originated for a file, a node directly sends the request to a node in the desired affinity group and that node knows which is the homenode for the file. So it returns the homenode address. Thus insertion and lookup can be served with O(1) complexity. When any lookup or insert request fails, the Kelip resends the query to multiple contacts or sometimes contacts are asked to forward the query within their affinity group bounded by TTL or sometimes it send the query to some other node in its own affinity group. Kelips is an upgraded version of p2p DHT in the sense that it reduces the lookup latency and insertion time for a file to a constant time. It shows satisfactory load balancing and robustness against many failures (even half of the nodes can fail). It does not need to maintain any structure or invariants. Also background overhead is low. Per hop memory requirements is a bit more but due to the invention of low cost high memory, this is an acceptable solution and does not affect the performance or scalability. Multi-hop/Multi-query makes the lookup successful even with low bandwidth and presence of partition. But Kelips does not keep multiple copies of a file as done by Chord. If the homenode of a file fails, then there is no way to find that file in the system. It can be fixed by replicating the file to a fixed number of nodes in the same affinity group. The lookup can access any of these homenodes randomly and return that address to the querying node. PASTRY: SCALABLE, DECENTRALIZED OBJECT LOCATION AND ROUTING FOR LARGE-SCALE PEER-TO-PEER SYSTEM Antony Rowstron1 and Peter Druschel This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing mechanism for wide-area peer-to-peer applications. Each node in the system is associated with 128 bits nodeId (calculated by hash function). Each node contains three sets of information. These are routing table, neighborhood set and leaf set. The routing table contains ceil(log2bN) rows with (2b-1) entries on each row, where b is a configuration parameter. Each row contains the IP address of one of the many nodes whose nodeId have the appropriate prefix. The choice of the node depends on route latency and locality information. It is assumed that the application provides a function that allows each Pastry node to determine the distance (measured as a number of hop) of a node with a given IP address to itself. A node with a lower distance value is assumed to be more desirable. The neighborhood set contains the nodeId and IP address of numerically M (=2*2b) closest nodes. Leaf set contains L (=2b) nodes where L/2 nodes with numerically closest larger nodeIds and L/2 nodes with numerically closest smaller nodeIds. For routing of a message, local node first checks the leaf set. If the message key lies between the leaf sets, then the message is directly sent to that matched node. Otherwise, it looks in the routing table and if the corresponding entry is not empty, forwards the message to that node. If the corresponding entry in the routing table is empty, local node sends the message to the numerically closest node of the message key from its neighborhood set. The probability of third case is very rare and so the expected number of routing steps is ceil(log2bN). Pastry also provides self-organizing structure. When a node joins the system, it gets its nearby node by the proximity matrix. Then the nodeId of the new node is routed in the same way as a message and ends up with the numerically closest node in the system. The new node then uses these end nodes and intermediate nodes and their information to build up its own routing table, leaf set and neighborhood set. Pastry routing procedure always converges, because each step takes the message to a node that either shares a longer prefix with the key than the local node, or shares as long a prefix with, but is numerically closer to the key than the local node. Pastry supports node failure until L/2 nodes of adjacent nodeIds have failed simultaneously. The paper also addresses the malicious node attack and partitioning of nodes. While forwarding a message, a node can select randomly any node from the list of nodes those satisfy the criteria. But the best choice is to use the biased probability distribution so that the average low latency path is selected. Also Pastry allows several retries if any of any routing fails. The node joining overhead and routing latency are logarithmic. Pastry doesn’t use the shortest path for routing, which can be used to improve the performance further. It could be strong support for Pastry if the authors could provide the comparison of this with other peer-to-peer system in the form of graph for lookup latency and networks bandwidth. Obviously it is better solution than Gnutella or Napster but Kelip can provide lookup and insertion with O(1) latency with less background overhead. AREFIN From: Riccardo Crepaldi [crepric@gmail.com] Sent: Wednesday, February 13, 2008 9:54 PM To: Gupta, Indranil Subject: 525 review 02/14 Kelips: Building and Efficient and Stable P2P DHT Through Increased Memory and Background Overhead This paper presents a novel DHT system, Kelips, that, by means of increased memory usage is able to achieve performance for lookups and insertion that is independent of system size. The motivation for this is that many of the other works on DHTs assume a similar cost for network and storage usage. However, given the consideration that in many cases network links experience high latency and small bandwidth, Kelips uses more memory to reduce the network load. The nodes are divided into affinity groups, using a hashing function that guarantees a good load balance. The same process is applied to filenames, as they are stored in a node belonging to its affinity group. Each node then stores partial information about other nodes in the network, either in its affinity group or not, and files that are stored into its affinity group. The query process takes one message in the best case, if the querying node contacts a node in the affinity group of the file that knows who is storing the file. If this is not the case the multi-hop query routing process guarantees that the look-up is successful with a number of hops that is independent of system size. The query can be routed to multiple nodes, or it can be routed to other nodes in the affinity group or, if none if this information is present, can be routed to other nodes in the affinity group of the querying node. The paper provides experimental results that show how the storage load balance is better than exponential, and that the queries rarely take more than 2 hops to be successful in a system reasonably big. The failure of half nodes in the network shows that still the queries fail only if the homenode of the query is failed. This paper address the realistic scenario where network communication cost is more expensive than the storage memory. One of the strongest features of the system is the non-dependancy of system dimension, that makes it scalable. The robustness to failures is impressive and proven by experiments. In the protocol description is not clear the reasons behind the choice of storing the file at the node where the TTL expires. No result about load balancing is provided in support to this choice. Also, if just the entry of a file was stored in the affinity group, instead of moving the whole file to a node in it, a lot of network load would be avoided. The paper is not clear about the motivation of this choice. ------------------------------------------------------------------ Resilient Overlay Networks In this paper the authors present Resilient Overlay Networks, an overlay architecture that is meant to reduce the recovery time after a path failure. Usually BGP routing algorithm takes several minutes to recover from an outage, while the proposed solution is shown to perform way better. The RON is built over the internet, and is supposed to be composed of a small number of nodes, that keep information about the path performance of each link with each other node in a database, as well as static routing tables to each other peers. When a data communication is needed between peers, RON decides if the standard internet routing is good enough or the RON overlay will select a node to act as relay. This design is trying to achieve fast path recovery, and it also allows tighter integration with application, being an overlay it is way more adjustable. Finally it allows to define more expressive policy routing rules. This could be very consuming in terms of data load, but the authors think that since the overlay is likely to be installed on few user machines this approach is still feasible. In the rest of the paper the authors present the design and the implementation of the algorithm. The evaluation section is maybe less well presented. The authors state that RON could recover from all paths failures in an average of 18 seconds, and that the network overload due to the routing tables dissemination and the channel probes is not too bad. However their methodology is not exactly repeatable and maybe some more experiment should be provided. A big challenge when using RON nodes to route traffic would be to keep a stateful connection alive when the route is switched from the original to a new one through a RON router. Finally the Architecture does not address the security problem assuming that all peers are trusted, and won't work if the peers are NATed. The authors point this issue though. From: Yusuf Sarwar [mduddin2@uiuc.edu] Sent: Wednesday, February 13, 2008 9:41 PM To: Gupta, Indranil Cc: ysarwar@gmail.com Subject: 525 review 02/14 Resilient overlay networks -D. Andersen et al, SOSP 2001 The paper proposes overlay network architecture on top of the current Internet in order to handle fast detection of path outages and recover from them within several seconds that usual Internet would require several minutes. The proposed architecture named RON (Resilient Overlay Networks) has the following features: - It has been observed that gateway routing protocols like BGP connecting networks with different AS take long time to update any path dynamics (e.g., link or router failure, change of link attribute), and usually the end hosts come to know of the path failure much later. So Internet applications running on top of this suffers from high latency and packet loss when any such event occurs. But it can happen that the same path can be reestablished if the data traffic is allowed to pass through an intermediate node. This is the key idea of forming the RON. - RON architecture forms an overlay network and each host is connected to all others by some virtual links, a real multi-hop path in the WAN. Hosts measure the link quality in terms of packet loss and latency by assigning a score to each link based on these two. Hosts also disseminate these information to other hosts in the form of link-state advertisements. - Hosts detect path failures by repeatedly sending probe packets to and receiving responses from other hosts. If the average packet loss during a certain period of time goes beyond some threshold, the link is considered as dead. RON recovers from the outages by forwarding the packets via other hosts. - In RON hosts act as both IP destination and IP forwarder (router). To forward IP packets via itself, each host builds a routing table with associated link metrics and destination-next hop mapping. RON hosts use classified routing policies to make routing decision considering various metrics (bandwidth, throughput) and quality constraints set by respective applications.. - Simulation results show that RON's performance in terms of packet loss, latency and packet delivery rate closely resemble to the Internet's performance, but RON can detect path outage much quickly than the Internet. - It is concluded that a big corporation having many hosts around the globe can construct a RON architecture on top of the Internet for better end-to-end performance, specially for conference like applications. Comments: - Overlay structures always work on a set of some predetermined hosts, and the hosts have got an internal agreement between themselves for special services. So, the approach cannot be generalized for common Internet services. - IP forwarding services are mostly done by routers in the Internet, and that's why routers are specially built for that. But when a host has to perform this forwarding and path selection tasks, it can be wondered whether it can be successful for operation of long time. Experiments conducted for pretty short duration (48/96 hours) or testbed experiments cannot warrant good results for real and live deployment, specially for longer operation. - When packets are routed by a RON host, it should encounter larger latency and longer processing delays. Also maintenance of RON nodes for perpetual operation is also a big hassle. Usually it belongs to the ISP's responsibility to manage routers in the Internet. ===================== Kelips: building an efficient and stable p2p DHT through increased memory and background overhead I. Gupta et al, IPTPS 2003 A peer-to-peer system is proposed where nodes manges the content in a group management fashion. Nodes are partitioned into several affinity groups, and every node keeps a (partial) list of other members in the same group, some (here, two) contacts to other nodes belonging to other groups, and filetuples that they all share in that group. Whenever a query is made, it's forwarded to the specific affinity group and the query is served. The system requires comparatively larger amount soft states and gossip based back-end processing to maintain these states. Pros: - Simple. - Lookup and insertions are claimed to be of O(1). - Highly adaptive to 'churn', frequent node joining and leaving. There are lazy processes in the background to maintain states across the system. So it converges overtime, still enabling lookup in the mean time. Cons: - Enormous amount of soft states, in the order of O(qsrt(N)). Three long tables, even the list of files shared in the entire group. The authors show a moderate requirement of somewhat 1.93M. This is not a data storage, it's the size of soft state of a single node to maintain of! - Consistent group membership overhead, and inter group communication. Lots of overhead and constant heart-beating processing behind to provide connectivity and consistency. - It is not clear how the inter group contacts are populated and maintained with newer contacts amid of joining and leaving. - The authors do not produce how large the overhead traffic looks like. - To attain an O(1) lookup (which is not guaranteed though), the cost seems be too high. Comments: - The authors entirely relies on gossip style state maintenance. Gossip is simple, workable, but how much reliable and available the service could be, it's a big question. - The affinity group could be constructed of locality aware or taking topological vicinity into account, rather than just randomly making nodes flock together. - The authors could be showing of some trade-off between the number of groups and individual group size. Theoretically it is shown as O(sqrt(n)) as the minimum state size, but there are lots of hidden constant in it. ===================== From: Hengzhi Zhong [hzhong@uiuc.edu] Sent: Wednesday, February 13, 2008 11:35 AM To: Gupta, Indranil Subject: 525 review 02/14 Resilient Overlay Networks Summary: Wide-area routing protocols take a few minutes to recover path outages and performance failures. A Resilient Overlay network is proposed to detect and recover path outages and performance failures quickly, in a matter of seconds. RON is an application-layer on top of the existing Internet routing substrate. It aggressively probes and monitors Internet paths and uses this information to route packets. Pros: 1. RON integrates routing and path selection tightly, which can incorporate application-specific metrics for faults and path selection. That is, it allows applications to influence path selection. 2. RON does fault detection and recovery in a matter of seconds 3. RON moves fault detection and recovery to a higher layer overlay, which allows faster response. Cons: 1. RONs detect problems by doing aggressive probing and monitoring. This increases bandwidth usage. 2. RON depends on the physical path redundancy. If there is not much redundancy, then RON is out of luck. 3. Each RON note must maintain information about alternate routes and select application-specific paths. What is the space requirement for maintaining this information? There could be many alternate routes. 4. Experiments show RON cannot overcome problems when the individual sites are unreachable from any other site in the RON. Would more RONs help? 5. RON must be small as to reduce excessive bandwidth from the aggressive probing. Scalability may be an issue. Kelips*: Building an Efficient and Stable P2P DHT Through Increased Memory and BAckground Overhead Summary: This paper uses increased memory usage and background communication overheads to reduce file lookup time and complexity and increase stability for high failure and churn rate. Kelips uses k virtual affinity groups. Each node in the affinity group uses a consistent hashing function to map the node's identifier into 0 to k-1. Each node contains contacts, filetuples, and affinity group view. They are stored in AVL trees. Kelips uses gossping to maintin consistency. Infomration propagates in affinity groups with O(log n) time. Background overhead is increased Pros: 1 memory usage is small for moderate sized systems (O(sqrt n) space for each node) 2 uses epdiemic multipcast protocol to replicate system remembership data 3 efficient file lookup (constant time and complexity) under normal conditions 4 fault-tolerant when chunk rate is high Cons: 1. refer to Pros 1, how bad would the memory usage be for huge sized system? In the PSP context, is a O(sqrt n) space significant? 2. how often do one-hop lookups or insertion fail? If there are many failures, what is the effect on file lookup time? It is only under normal conditions that file lookup is constant. 3. does kelips still work efficiently (efficient file lookup) when churn rate is high? Hanna Zhong