From: Mayssam Sayyadian Sent: Tuesday, September 21, 2004 12:24 PM To: Indranil Gupta Subject: 598ig review 9/21 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel SOSP 2001 This paper introduces and explains PAST which is a large-scale distrubuted (internet based, p2p) storage management system (storing multiple replicas of files: archival storage and content distribution meaning it's not a general purpose file system). The goals of the project are mentioned to be: strong persistence, high availability, scalability, security. Four major components of the system are described: client's api, pastry (the p2p routing substructure), storage management and cache management. An interesting component is the Pastry which is the P2P routing substrate. The idea is adapted from Plaxton's algorithm. Being self-organized (decentralized) is an aspect of the system. In building the storange management, the challenge is in balancing unused storage among nodes and maintaining copies of each file near neighbor nodes. Experiments show that global storage utilization is improved to a great extent. In order to improve the system it might be possible to reducie replica diversion (e.g. node may forward to the next node or store statistics in routing tables). A very wise plugin to the system could be a directory service and something which needs to be considered at least for evaluation and experiments is the notion of master nodes (or proxies) over the network. Also the system may fail (decrease utilization) in case of very large files (that might be looked up rarely). SHARP: An Architecture for Secure Resource Peering Yun Fu, et al. SOSP 2003 Sharp is a framework for secure distributed resource management in Internet-scale computing infrastructure. Experiments show deployment of SHARP in PlanetLab. For distributed and p2p economies, after resources are discovered a reliable, accountable, and scalable resourceexchange framework must be established to provide the mechanisms for p2p resource bartering. SHARP is to address this challenge. It represents resource delegation using tickets, which assert that a peer (the holder) controls a set of another peer’s resources over some time interval (its term). Each ticket is signed with the private key of the resource owner (the issuer). A holder may delegate a portion of the resources claimed in a ticket to another peer by issuing a subticket for a subset of the resources over a subinterval of the term, signing and concatenating it with the original ticket as a new ticket. The cryptographically signed tickets are unforgeable, non-repudiable, and independently verifiable by third parties. The notion of over subscription, resource routing is presented, and security architecture is put forth for these issues. Scalability of reliable group communication using overlays F. Baccelli et al Infocom 2004. This paper is about scalability of reliable group communication mechanisms using overlays. In IP supported multicast, throughput tends to decrease as group size goes increases drastically. The idea here is using end-systems and have them forward incoming packets to downstream nodes using different unicast connections. The paper makes some assumptions such as buffers in end-systems are large enough for the storage. Also it states that the throughput of the reliable overlay group communication scales in the sense that for all multicast tree sizes and topologies, the group throughput is strictly positive when the saturation throughputs of all unicast connections are bounded away from 0. The paper also investigates scalability of packet delay and buffer occupancy, showing that the occupancy of the buffer and the latency in the end-systems explode with time provided no additional control. Then assuming certain moment conditions are satisfied by cross traffic in the routers, solves this problem to some extent by implementing a proactive rate throttle mechanism at the source . Finally some few important practical matters pertaining to the shaping of the trees and the required control mechanisms are discussed. One of the strong points of the paper is the general observation that maximization of the group’s throughput, the design of the protocol and the construction of the distribution tree should take into account the local maximum throughput of the TCP connections between end systems. This consideration could be useful in designing protocols and algorithms for the group communication using overlays which primarily focus on the network distances between end systems. But in the paper it's not discussed what to do when the buffer of a node is full, this node stops the communication coming from the upstream node. From: Jin Liang [jinliang@cs.uiuc.edu] Sent: Tuesday, September 21, 2004 11:05 AM To: Indranil Gupta Subject: 598ig review 09/21 Storage management and caching in PAST, a large-scale, persistent perr-to-peer storage utility This paper describes the PAST storage management system. PAST consists of peer nodes who contribute storage to the system as well as storing files in the system. PAST is built on top of Pastry. Each node in the system has a unique nodeID. Each file also has a fileID. A file is stored on a node whose nodeID is numerically closest to the fileID. Files are also replicated on k nodes whose IDs are closest to the fileID. Since PAST stores whole files, and nodes may have different storage capacity, load balancing is an important issue. PAST uses two techniques to balance load. The first is replica diversion. If a node belongs to the replica set of a file, but the node does not have enough free capacity (dictated by its policy), the node can select another node in its leaf set and store the file there. The second is file diversion. If replica diversion fails to find k nodes to store the file, the file is hashed to a new fileID (by choosing a different salt). If the system utility is relatively low, the file is likely to be successfully stored under its new ID. PAST also uses caching to store the file along the path from the query to the node that stores the file. Simulation results show that PAST can achieve high system utilization with relatively small file insertaion failures and diversion rates. A potential problem the paper is that it didn't evaluate the performance of PAST under churn. Since a file is replicated on k nodes whose nodeIDs are closest to the fileID, if some node in the replica set leaves, or some other node joins the set, files need to be migrated. The paper mentions that migrations can be done gradually, with pointers installed first. But this would make the protocol complicated and it's unclear how the system would perform, especially under multiple joins/leaves. SHARP: An architecture for secure resource peering This paper presents a resource peering system for resource sharing in a federated computing system such as a grid or a p2p system. In such a system, each node owns its resources, and each node can contribute some of its resources for global sharing. The question is how to securely share the resources without centralized coordination. SHARP uses resource tickets to grant resources to agents, and agents can delegate the resource claim to other entities. The delegation chain is signed by each agent, so it can be verified by any resource owner. A service manager can then redeem the tickets at some resource sites. To preserve site autonomy, however, resource managers can decline the tickets, which means the tickets only grant probabilistic resource rightes to the users. This also allows resource owners to over subscribe their resources, whihc increases the resource utilization. Scalability of reliable group communication using overlays This paper examines the scalability of reliable group communication when group members communicate using TCP connections. Previous work in IP layer has shown that the scalability of reliable group communication is poor. When the group size increases, the group throughput would not increase indefinitely. This paper uses the max-plus TCP model to model an overlay network consisting of TCP connections, they analytic and simulation results show there is always a positive gain in throughput. The paper also examines the buffer occupancy and delay of the packets. It shows that proactive rate throttle mechanism at the source can lead to finite buffer occupancy and packet delay. From: Michael Treaster [treaster@cs.uiuc.edu] Sent: Tuesday, September 21, 2004 10:53 AM To: Indranil Gupta Subject: 598ig review 09/21 PAST This paper describes PAST, a distributed storage system built on the Pastry network overlay. The system is intended for archival and persistent storage purposes rather than as a general-purpose file system. The key contribution of this paper is its storage management scheme. It employs a statistical scheme to distribute files throughout the network, and clusters files near nodes with node IDs similar to the file ID. It succeeds in attaining a high utilization of available storage space while rejecting few files through the use of replica diversion and file diversion, which are mechanisms for relocating and caching files on the network. The paper has two significant shortcomings. First, security is based on users possessing a physical key in order to prove their identity. This is a fairly heavy-handed approach to security. It likely to be uncomfortable for users, who are accustomed to passwords, and expensive to deploy. Also, should a user misplace his key he would lose access to his files. Secondly, although the authors describe a caching mechanism to improve access performance, they do not describe how the coherency of the caches is maintained. When a file is modified, the replicas are not updated to the latest version, and it would be easy for another user to accidentally retrieve a legacy version of the file without being aware that a more recent version was available. SHARP SHARP is an architecture for resource allocation in a distributed environment. The key contribution of this paper is twofold. First, the SHARP system manages resource allocation through a system of tickets and leases. Tickets represent a slice of time on a resource. Tickets are given to agents, which resell them to users. A ticket, however, is not a hard guarantee for access to a resource. Instead, it grants the user a soft claim to the resource, which is transformed to a hard claim, or lease, when the ticket is redeemed. Tickets can be oversold, like airline tickets. If a user attempts to redeem a ticket that is already in use by another user, he is rejected. The second contribution is the demonstration that overselling tickets is actually a desirable thing to do. They show that by overselling tickets, the system is able to make more efficient use of available resources. Since users will often request more resources than they actually require, resources would be left unused unnecessarily when a task completed early. Additionally, some users may request resources but then fail to actually execute their task. By overselling tickets to the same resource, SHARP is able to fill these spaces in scheduling that would otherwise be idle. This system might be adapted to manage high-performance computing resources, such as clusters. Two approaches might be viable. First, the system might be used to allocate resources within a cluster in place of a more traditional scheduler. This environment is similar to the PlanetLab environment the authors describe in that users request resources ahead of time and those resources are frequently only partially used. SHARP might also be used to allocate computation jobs in a grid environment spanning multiple clusters. In this proposal, clusters are analogous to the individual machines in the PlanetLab network and users request CPU time on the cluster. There would be some added complexity in this system due to the common use of a scheduler layer in cluster resource allocation, but this might be addressable using virtual machines or virtual processors as done in PlanetLab. Scalability This paper presents an approach to studying the scalability of overlay networks that are built on top of TCP. They use a theoretical approach to assess the scalability limits of these types of networks. Then they validate their theories by gathering experimental data in simulation and on real networks. The authors' studies focus on three performance metrics: throughput, latency, and buffering requirements. The authors' approach is not practical for all networks because it requires complete knowledge of the network to be usable. This knowledge might be difficult to obtain in several types of situations. The network might be extremely large. It might be partly or entirely controlled by another organization that is not interested in providing information about their internal network structure. Finally, networks might have highly fluctuating network topology, as is the case in most peer-to-peer systems. In all of these situations, the analytical method described by the authors will fail. Since these situations are not uncommon, the application of the authors' work is limited. From: Thadpong Pongthawornkamol [tpongth2@uiuc.edu] Sent: Tuesday, September 21, 2004 10:50 AM To: indy@uiuc.edu Subject: 598ig review 09/21 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility The paper proposed by Rowstron & Druschel presents and evaluates the distributed storage management system on top of Pastry , a digit-based-routing distributed hash table which guarantees O(log n) lookup time. PAST tries to apply concepts of security, locality, as well as availability into its storage management system. The main concepts presented by PAST is how to balance the load in storage system in order to gain maximum storage utilization and availability among heterogeneous nodes with storage imbalances. In real world systems, it is possible that there are such inequalities in sizes of node storages, sizes and popularity of files. The operations provided by PAST are insertion, look up, and reclamation ( weak deletion). However, PAST is lacking of strong deletion and update. PAST also assumes that objects stored in a PAST system are static and cannot be changed. All operations in PAST apply the same routing concept from PASTRY. Each node in PAST system has its quasi-unique NodeId. Such concept is also applied to every file with global FileId. Any insertion/look up/reclamation originated from an arbitrary node will be routed by comparing each digit of FileId of requested file. It is trivail to see that the number of hops used to route a message equals the number of digits of FileId. For performance of system utilization, PAST always create a number k of replicas of each file inserted to PAST system. All k replicas will be routed to store at a set of k nodes with FileIds that are nearest to FileId of that file. Replica diversion and file diversion are adopted to solve storage imbalance in the case that some of such k nodes are unable to store the requested file. Also the concept of caching is applied to maximize the utilization of unoccupied space in the system. In conclusion, PAST can achieve a certain level of availability and performance for a distributed storage network. However, there might be some questions about its reaction to dynamic behaviors, such as churn or severe failures. Moreover, it is still lacking of update, deletion and versioning operations that should be very useful to apply with files in storage systems. SHARP : An Architecture for Secure Resource Peering The paper proposed by Fu et. Al introduces a new concept of global resource management which allow any identities in network to share their resources among them in a distributed, decentralized, well-defined manner. The SHARP system comprise of site authorities (resource holder), service managers (resource requestor), and agents (negotiator). With the concept of tickets (soft-state claims) and leases (hard-state claims), SHARP allow service managers to borrow powerful resources at site authorities via agents. With the concept of resource delegation, SHARP allow any principals which are holding local and remote resources to sublease their current resources. Thus provide extensibility to the system. Security is another main consideration in this paper. This paper describes many possible threats and attacks which can be occurred in decentralized system. It also proposes many mechanisms, such as public/private key pair, conflict resolve algorithm. The paper also mentions Sybil attack, a malicious node with several forged identities, and how to deal with such problem. Another considerations described in the paper is resource availability and efficiency. SHARP uses two techniques to deal with such a problem. The first technique is timed resources. Any granted resources have expiration dates. This methods guarantee that no one can control any portion of resource forever. The second technique is oversubscribing, issuing more tickets than available resources. This concept is very useful because it provides resource redundancy in cases of failures. However, it must be used carefully because such technique tends to increase rejection rates due to resource deficiencies. In conclusion, this paper presents an overview of distributed resource management systems, which can be very useful for economic model. However, many concepts seem to be generic, high-level policies. More detailed mechanism should be defined clearly. For the example, how each entity compares resources when it decides to barter or trade resources. Scalability of Reliable Group Communication Using Overlays The paper proposed by Baccelli et al. gives a mathematical prove about how to achieve scalability in reliable group communication using overlay network using direct TCP connection. The prove is done based on the max-plus presentations. There are three main goals mentioned in the paper : throughput, delay, and buffer occupancy. The result obtained from this paper surprisingly contradicts to other previous papers. The result show that scalability on reliable group communication can be achieved by limiting transmission rate on every nodes to the minimum of the local maximal throughput of all overlay edges. The paper also tries to solve scalable buffer problem, which can be solved by message rate control. In the protocol design section, the paper introduces an algorithm to create optimal tree which has maximum of the local maximal throughput at bottle neck. From: Yookyung Jo Sent: Tuesday, September 21, 2004 10:37 AM To: Indranil Gupta Subject: 598ig review 09/21 PAST : Summary : PAST is a distributed storage system on top of Pastry which is a DHT P2P. It provides file insert (with specification of how many replica should be created), file retrieve(with fileID), storage reclaim. In fact, the underlying Pastry provides much of the functionality for file storage already. What PAST adds to that is the enabling of high utilization (system behaves well, when reaching the maximum allowed storage capacity), by the mechanism of replica diversion and file diversion. Also PAST provides a caching mechanism for faster lookup. Following are a few more details. PAST's k replica for a specific file are located among nodes closest in nodeID values, which ensures a truly diversed replica location and condition, because there is no correlation in numeric value of nodeIDs in original Pastry. To reduce the rate of failure of file insert at high utilization, the system performs replica diversion and file diversion. replica diversion is, when not all k nearest nodes can accomodate the file, a node failing to accomodate the file contacts other numerically close node to divert the replica and maintain the pointer. file diversion is, to abort the insertion and try the insertion with new fileID (try new nodeID space). This results in balancing different portion of nodeID space. The cache policy of the system is that a file is cached at each of the nodes in the query or insert route, with fixed size. Their experiment shows that their replica diversion, file diversion, caching achieves the performance claim that they made. Comments : It should be noted that with the current functionality of file retrieve they provide (retrieve with fileID, not filename), the system is for archival storage or content distribution utility, not a general distributed file system. And it should be also noted that their main contribution over the original functionality of Pastry is effective, only when the file storage gets close to the system storage capacity. There is a high potential that the system might exhibit problems when a node addition or failure happens intensively. For example, In case of node addition(system growth), a node with diverted replica may be located outside the leaf set of the originator, that is outside the heartbeat check. They suggest gradual migration as a solution. But, it seems that the specification of the system behavior and analysis is not enough for churn, and they have not provided experiment for this either. They provide, in some sense, Freenet-style caching. Then, the path-lengh for file lookup and insert is critical in system performance. In their experiments, path-length was less than 2 hops, which is good. But, it should be noted that path-length is, largely, determined by typical query hop length, which is controlled by Pastry's base b with the trade-off regarding maintenance cost. SHARP : Summary : SHARP is a comprehensive architecture of distributed resource management system. It is decentralized and tries to provide efficient resource management, by paying attention to resource availability and utilization issue and security. The network of resources are segmented and each site's resources are maintained by the site authority. And there are agents who mediate resources. Their key construct is two-phase(ticket-lease) resource management. Tickets are soft claims(no guarantee) for resources, that can be recursively delegated from the root(the site authority). Any entity trying to secure a resource first gets the ticket from any agent who has the ticket and redeem it at the site authority. Tickets are timed and over-issued (oversubscription: total resources by tickets can exceed the actual resources). Compared to simple resource request, this two-phase mechanism enables better resource discovery (more agents advertising with oversubscription), better resource availability (overbooking works when not all ticket holders redeem. Resilience to node failures) They employ digital signing and other security measures and analyze how the system cope with the expected security threats. Comment : It can be problematic that No central authority site that certifies the authenticity for security keys. The paper is about abstract yet very broad specification of resource management system, with no hard proof of its features. Thus, it needs an extensive empirical validation, more than what is presented in the paper. Specifying the tiket lineage in the recursive delegation of tickets is a good idea. The paper has presented how it can be used for accountablility of malicious nodes. More generally, with the feedback mechanism enabled by this lineage, many interesting things such as dynamic and peer-cooperating way of adjusting the amount of oversubscription could be done. The system assumes reasonable degree of clock synchronization. Scalability of Reliable Group Communication using Overlays : Summary : This paper presents the theoretical analysis of the scailability of multicast using TCP overlay. Their analysis considers throughput, buffer size, latency for scailability. Assuming infinite buffer size, they show that the throughput is the minimum of local maximum TCP throughput of overlay edges, in chain topology. Thus, if all overlay edges have positive throughput, then the total throughput is positive, regardless of how large it is (=>scalable). They also prove it for tree topology with further assumption. Next, they show that, by rate control(sending packets with intervals, instead of sending them all at a time), bounded buffer size and bounded latency can be achieved. Finally, they presents what their analytic proofs imply to the actual system design in two ways - in designing the optimal multicast tree (maximize the minimum edge throughput) and in rate control -. Comment : Certain difference in TCP multicast and IP multicast is abstracted and plugged into the proof of scalability of TCP multicast. I might need more network background to understand exactly what this is. From: Jungmin So [jungmin.so@gmail.com] Sent: Tuesday, September 21, 2004 10:35 AM To: Indranil Gupta Subject: 598ig review 09/21 Paper Review - September 21 - Jungmin So A peer-to-peer system, as opposed to a client-server system, is where each node acts as both a server and a client. This kind of system can be built on top of internet using overlay, a virtual topology established using underlying physical links. Because of its decentralized nature, peer-to-peer systems are in general more tolerant to partial failures in the system, achieving higher availability. On the other hand, due to the same decentralized nature, coordination among the peers is a non-trivial problem, although they are crucial in achieving efficiency and security. The papers that are reviewed today shows three major applications of peer-to-peer systems: file sharing, resource sharing and group communication. All of these papers essentially deal with the issue of global coordination among peers in the system, to achieve a global goal. The first paper by Rowstron et al. [1] proposes PAST, a storage utility that runs on top of Pastry. Pastry is a peer-to-peer system that has logarithmic insert and lookup cost. If applied for file sharing, Pastry distributes files among the peers so that the number of files are about equal in all nodes. Also, the routing mechanism efficiently finds files in the system when there is a request. The files are replicated to resist failures. The basic Pastry system does not consider the difference in file sizes and storage capability of peers. Accounting for these differences, balancing the number of files in each peer is not a proper way to achieve load balancing. This paper proposes replica diversion and file diversion to improve load balancing. Whenever the amount stored in a peer exceeds a certain percentage of the node capacity, the peer asks a nearby peer if it can store the file. Also, in Pastry, there is a fixed number of replicas for all files in the system. To improve efficiency and availability, this paper proposes caching of popular files. If peers have capacity left empty, it can use some space for caching files. Cached files are different from normally stored files in the sense that if there is not much space left, these cached files get wiped out. So it is an optimistic caching, but it improves efficiency substantially. The second paper by Fu et al. [2], uses peer-to-peer systems for resource sharing. The critical problem in sharing resource is to prevent malicious and selfish behaviors. The proposed architecture, SHARP, uses ticket systems to achieve the secure resource sharing. Each peer has a site authority which manages the resources that a particular site has and is willing to share. There are agents who act as brokers between users and the owners of the resource. So a user needs to get a ticket from an agent and then contact the site authority to "lend" a resource for a finite time. SHARP uses oversubscription to avoid waste in the resource because of the claimed but unused tickets. If there are conflicts because of oversubscription, site authority has the right to reject the request for resource. The final paper by Baccelli et al. [3], considers peer-to-peer systems used for application level multicast. Instead of using IP multicast, peers establish multicast tree on top of overlays and use point-to-point TCP connections to deliver multicast traffic. The overall group communication throughput is the throughput of the bottlenect edge in the multicast tree. So this paper provides an algorithm for building a multicast tree to achieve the group throughput as high as possible. It also argues that if the sending rate is less than the expected group throughput which is the throughput of the bottleneck edge, the throughput can be achieved with finite buffer size and finite delay. By having the multicast tree above the transport layer, the application level multicast and achieve reliability using TCP. However, the multicast tree built on top of an overlay is tend to be less efficient then a multicast tree built at IP layer, because it is more difficult to consider the underlying physical topology to build an efficient tree structure. [1] A. Rowstron et al, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, SOSP 2001. [2] Y. Fu et al, "SHARP: an architecture for secure resource peering", SOSP 2003. [3] F. Baccelli et al, "Scalability of reliable group communication using overlays", Infocom 2004. From: Jintae Kim Sent: Tuesday, September 21, 2004 9:51 AM To: Indranil Gupta Subject: 598ig review 09/21 CS598IG Review 9/21 Jintae Kim (kim28@uiuc.edu) Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, A. Rowstron et al, SOSP 2001 PAST contributes a distributed file sharing system layered on top of Pastry. Each file is stored in the system as a complete unit and replicated at the k nodes closest to the fileID, derived from the name, user ID, and a salt; the salt value is used to allow a client to attempt insertion at different parts of the Pastry key space. PAST guarantees that files will remain in the system until deleted; files are not directly writable. PAST requires several mechanisms to achieve load balancing. Replicas may be diverted away from the k closest node to a key, using forwarding pointers. This diversion potentially increases the latency of lookup, and so large files are preferred for diversion over smaller files. Diversion is performed in advance of storage pressure to smooth out degradations in performances over different system load levels. PAST has been shown to be reliable in terms of the accessibility of files in the presence of failures. However, the system has difficulty coping with large files, and any size of file when load increases beyond a certain point. SHARP: an architecture for secure resource peering, Y. Fu et al, SOSP 2003 SHARP provides a secure architecture for exchanging resources on pair-wise peering relationship, which enables distributed systems to arbitrate access to global resources. It is not only a decentralized system that no central certification authorities exist, but also a secure system that employs cryptographic tickets and leases flexible to a broad range of policies and resilient against a number of attacks. SRRP, Secure Resource Routing Protocol, has been implemented to discover optimal paths for target resources based on changing metrics. In this protocol, there may not be sufficient available tickets along particular path to obtain target resources. This leads SHARP to adopt the concept of oversubscription, which allows authorities and agents to inflate the number of available tickets. The experimental results show that oversubscription helps to enhance the availability and efficiency of this system. Scalability of reliable group communication using overlays, F. Baccelli et al, Infocom 2004 This paper presents a mathematical framework, which proves that overlay-based reliable group communication using TCP is scalable. The authors model the overlay group communication on several topology such as chain topology and tree topology with different types of links. They first show that the throughput is scalable unlike the IP supported multicast. Afterwards, it is shown that buffer size and latency can also be scalable under the condition using rate control, through both mathematical analysis and experimental results. The main contribution of this paper that I think, though, comes from its implication on the overlay protocol design. In order to maximize the group throughput of the overlay networks, the protocol and distribution tree should consider the local maximum throughput of the TCP connections between end systems, which can been neglected more easily than network distances between them. Also, the result implies that rate control should be considered in the design of efficient and effective reliable overlay multicast, because it provides scalable throughput and buffer occupancy. From: James Richard Newell Sent: Tuesday, September 21, 2004 9:21 AM To: Indranil Gupta Subject: 598ig review 09/21 P2P Apps - PAST, SHARP, Scalability Using Overlays James Newell 9/20/04 The first paper Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility by Rowstron and Druschel describes the architecture of a peer-to-peer distributed storage network. This design is called PAST and is used for persistent storage on a non-centralized overlay network that allows for file queries, insertions, caching, and replication. PAST is developed on top of a routing protocol called Pastry which routes messages "efficiently" to the nearest node using O(log) hops. This is done by keeping a routing table of hashed fileIDs and jumping to the node with a closer nodeID than itself by at least one digit. During insertion, a node will specify k which dictates how many copies of the file will be produced on alternative nodes. In addition, copies can be cached during routing on unused disk space temporally to increase efficiency. The authors use the GreedyDual-Size policy to implement file caching. PAST also tries to increase efficiency by proactively maintaining uniform distributing of files using techniques of replica diversion (leaf space) and file diversion (node space). These diversion tactics are primarily used for larger files to leave space for multiple smaller ones. Nodes management is maintained through keep-alive messages (for failure) and routing table messages between leaf nodes. Finally, the authors do some evaluations on a system implementation and discover optimal parameters (for diversion) and determine maximum storage utilization without performance degradation. PAST seems strikingly similar to the other P2P storage system we already read, like Chord. They both use the logarithmic hops to locate file location. In large networks (thousands) this actually can be quite costly. It also would have been nice to see some metric on fault-tolerance and performance during node failures. Personally, I think there were a lot of assumptions about the effectiveness of Pastry and its self-organizing ability. The next paper is SHARP: An Architecture for Secure Resource Peering by Fu, Chase, Chun, Schwab, and Vahdat. SHARP is a quasi-economic resource management system that allows for effective and secure distribution of remote resource sharing across a network. The paradigm behind the system is the principle-agent system using claims to contract resource leases. A site can authorize a resource to be made available by authorizing an agent to grant tickets to wanting parties. The ticket is a probabilistic affirmation (like a flight reservation) that the receiver can obtain the resources. The receiver will then verify the ticket to obtain a concrete lease for the resource, or delegate this ticket to other third-parties. This system allows agents to "over-subscribe" resources to increase utilization on the network's resources. However, a trust system is in place to ensure accountability of rejected claims and allow compensation. To implement this idea and detect conflicts the ticket need to be self-verifiable and secure. This is done using a system of peer-relationship data and digital certificates using private keys. The availability of peer relationships is made known using a complex routing mechanism known as Secure Resource Routing Protocol. The authors then go on to evaluate the performance and resource utilization of oversubscription using a simple implementation on PlanetLab. I think SHARP is a good system towards effective resource distribution, which is crucial for any GRID implementation. I think the idea of using an economic system is a nice idea, and oversubscription is alright in theory. However, how do you determine how much you allow oversubscription to take place? I can imagine every network would be different in this regard and hence could be susceptible to large overhead if guessed wrongly. I was also confused as to how one would dictate the policy as to ticket delegation and bartering for a given resource. They seem to lack an expressive mechanism to accomplish this. The final paper is Scalability of Reliable Group Communication Using Overlays by Baccelli, Chaintreau, Liu, Riabov, and Sahu. This paper is primarily a proof paper about the effectiveness of using TCP overlay multicast as opposed to IP multicast. The author claim that the TCP multicast has a positive throughput as group size scales to infinity while IP multicast goes to zero. This is done by maximizing the group's throughput by taking into account of the local maximum throughput of the TCP connections between end systems. They also determine that by using pro-active packet throttling, one can maintain finite latency and buffer occupancy under scalable conditions. They show this idea has general link to the type of hydrodynamic limits found in percolation and particle systems. Using the Legendre transform of some hydrodymanic shape, one can calculate the behavior of buffer contents when the size of multicast group grows. This paper adds to the confirmation that overlay multicast networks are effective as reliable distributing messages even though there are potentially redundant links. This paper's particular contribution seems to address the scalability of overlay networks. Unfortunately, it shows that multicasts are sensitive to congestion, therefore they recommend message throttling to ease the buffer and latency load of the network. However, this means that these types of networks are vulnerable to malicious nodes and DoS attacks. From: Adeep Singh Cheema Sent: Tuesday, September 21, 2004 9:13 AM To: Indranil Gupta Cc: adeep.cheema@gmail.com Subject: 598ig review 09/21 09.21.2004 Cheema, Adeep S p2p Apps cheema@uiuc.edu [PAST - p2p storage] PAST is essentially a self organizing, Internet based overlay which allows storage management and caching. Storage nodes cooperatively route file queries, generate redundancy and build caches. Files are assigned to nodes based on a dual ID system. A file is stored at a node whose ID is the closest match to the file. A file cannot be inserted multiple time with the same fileID. This is a fairly balanced way of storing files. Nodes are assumed to have identical capabilities and responsibilities and communication is assumed to be symmetric. Goals of the system are providing robustness, high availability, scalability and security. The Pastry protocol is used for efficient routing of requests in PAST. A client must know the fileID to retrieve a file, and searching is not allowed along with directory lookup or key distribution. Insertions [with replication], retrievals and pseudo-deletions [without total removal guarantees] are supported. PAST is layered over Pastry which is a p2p layer providing scalability, efficient routing, resilience and self-organizational capabilities. It enables routing to a node likely to contain a file [closest ID] when supplied with a fileID. Pastry offers a reasonable robustness guarantee and only fails to route if a large number of nodes fail simultaneously. Storing comprehensive routing tables at each node ensures this In extreme operating conditions – high resource load on nodes, techniques like replica diversion [inserting at a lead node] and file diversion [recomputing fileID] are used to balance load over peers. Caching in PAST is done using the unused portion of advertised peer disk space. Experiments show that PAST perform well in the presence of high demand is able to meet its goals through effective caching and load balancing. [SHARP - Secure Highly Available Resource Peering] SHARP is an infrastructure for resource peering - the location independent trading or contribution of resources among peers on an Internet scale network. Resource requests are securely represented as claims and these claims which are subdivided and delegated to resource managers by the system. The system assumes a model where every heterogeneous site or domain runs a local scheduler for its own physical resources. The system attempt to develop abstractions in order to allocate resources across the system in a fair and predictable fashion while ensuring that smaller units of the system still maintain autonomy. Other goals include making resource claims flexible and secure and ensuring safety of resources. SHARP decentralizes control over resources by appointing multiple resource managers which control different though overlapping areas of the resource pool. This ensure that there is no central point of failure or trust. Claims made by nodes for resources are timed and expire after a preset interval, and are probabilistically guaranteed. Agents are allowed to oversubscribe resources to improve efficiency. Membership is fairly flexible unlike other historic resource sharing protocols. Locking of resources is done in two phases, with tickets, which come with soft guarantees, and leases which come with hard guarantees to a resource slice. There are several roles for Managers / Agents for performing different kinds of allocation as well as for bookkeeping. Security is an important concern for SHARP and is incorporated by taking concrete measure against the most likely and common attacks imaginable on such a system. Oversubscription is used as a tool to increase efficiency by allocating more soft resources [Tickets] than actually held in the resource pool, with probabilistic guarantees of claims. [Scalability of Reliable Group Communication Using Overlays] This paper is regarding mechanisms that use TCP connections for overlay based communication between different nodes. It establishes a mathematical framework based on TCP to address scalability issues of group communication with respect to criteria like throughput of group communication, packet delay and buffer requirements. Through mathematical modeling of TCP, the paper is able to achieve the following results: The throughput of an overlay is determined to be the minimum of the local maximum throughput of overlay edges. For an arbitrary tree, the throughput of group communication is defined as the minimum throughput observed at the end-systems. In addition though, transfers of the multicast tree in another reference path does not affect throughput. It is necessary to control rate of transmission at the source for better local latency and bounded buffer occupancy. This was confirmed experimentally and bugger occupancy was determined to be strikingly low. The paper suggest that overlays be designed with their results in mind, i.e. they should accommodate protocols for finding spanning trees that maximize throughput for group communication and should incorporate rate transmission measures as well. From: William Conner [wconner@glsn33.ews.uiuc.edu] Sent: Tuesday, September 21, 2004 8:05 AM To: Indranil Gupta Subject: 598ig review 09/21 STORAGE MANAGEMENT AND CACHING IN PAST, A LARGE-SCALE, PERSISTENT PEER-TO-PEER STORAGE UTILITY PAST is a peer-to-peer global storage utility that aims to provide strong persistence, high availability, scalability, and security. PAST is built on top of Pastry, which handles the peer-to-peer routing and content location. In PAST, each stored file has a 160-bit fileId that is obtained from a secure hash. Each PAST node also has a 128-bit nodeId that is obtained from a secure hash. To insert a file, it is stored at the k PAST nodes whose nodeIds are numerically closest to the most significant 128 bits of the fileId for that file. In this case, k is the number of replicas that the user requests. File lookups are done by a series of routing steps in Pastry where each node forwards the message to the node with a nodeId matching a longer prefix of the fileId or a node with a prefix match the same as the current node, but closer numerically to the fileId. An interesting feature of PAST is how it handles storage management when the system approaches its maximum utilization. If a node can't store a replica of a file, then it attempts replica diversion by requesting that another node in its leaf set store that file. If none of the nodes in a leaf set can store a replica of a file, then the system attempts to re-insert that file under a different fileId. This is referred to as file diversion. Experiments demonstrate that PAST can achieve global storage utilization of 98% or greater. One problem with PAST is that it doesn't address how it provides strong incentives for users to contribute to the system. It mentions that providing storage and routing are optional, so it is unclear how PAST handles freeloaders. It seems like PAST's use of quotas could somehow be used to provide incentives for users of the system. It also seems like PAST doesn't have a very convincing way to handle churn in its system. The authors propose a quick fix (i.e., pointers in file table) to avoid a new node requesting all replicas right away and state that migration is gradual in the background. It would be interesting to see how this handles churn. SHARP: AN ARCHITECTURE FOR SECURE RESOURCE PEERING SHARP is a new approach for secure, flexible resource management for wide-area networked systems. One of its primary features is its use of cryptographically protected resource claims that can be delegated from one entity to another in a secure fashion. In SHARP, each network service runs with a slice of global resources allocated to it. In order to obtain a slice, each service has a service manager that contacts an agent. In order to supply resource tickets to service managers, agents contact site authorities. Site authorities at each local site control the local resources of that site. Tickets only guarantee the allocation of a resource with some probability. Ultimately, each resource consumer will have to supply a site authority with a ticket in order to acquire a lease for resources. Leases are guaranteed resource claims. The use of tickets and leases allows agents to oversubscribe resources. This is supposed to increase resource availability and resource efficiency (depending on the degree of oversubscription), because resources can still achieve near 100% utilization even if some ticket holders fail. However, there is a chance that fewer nodes than expected will fail and then some ticket holders will be denied when they try to redeem their tickets. In this case, the ticket issuer is responsible. Since all resource claims and rejections are signed, it is possible to hold sites accountable for fraudulent or malicious activity. Although the authors did some experiments on oversubscription degree for a particular configuration, it would be important to develop some general principles on how other application developers can choose a good oversubscription degree since it is critical to availability and efficiency. In real-time applications, the degree of oversubscription (possibly only 100% or less, which would be undersubscription, to be safe) would be critical since real-time applications cannot be denied resources that they thought they had acquired. SCALABILITY OF RELIABLE GROUP COMMUNICATION USING OVERLAYS This paper addresses the scalability of reliable group communication using overlays by using theoretical modeling, experiments in the Internet, and simulations of large networks. In particular, it addresses group throughput, buffer occupancy, and packet latency. Group throughput is defined as the minimum sending rate of all edges in the overlay. The authors show that regardless of group size and behavior of the underlying network connecting end systems, there exists a strictly positive group throughput. However, in order to ensure the scalability of buffer occupancy and packet latency, special rate control mechanisms are needed. Although the authors identify the requirements for a protocol for designing overlay-based reliable group communication, they do not provide a detailed solution. Future work would be a detailed protocol that satisfies their requirements. From: Pei-Hsi Chen Sent: Tuesday, September 21, 2004 2:38 AM To: Indranil Gupta Subject: 598ig review 09/21 Pei-hsi Chen pchen14@uiuc.edu -Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility This paper presents and evaluates PAST with a focus on its storage management and caching. The goal of the storage management is to allow high global storage utilization and graceful degradation as the system approaches its maximal utilization. PAST achieves this in two ways: 1. Replica diversion. 2. File diversion. Replica diversion can accommodate differences in the storage capacity and utilization of nodes within a leaf set. And when a node's entire leaf set is reaching capacity, file diversion will be performed to balance the remaining free storage space among different portions of the nodeId space in PAST. The cache management balances the query load for popular files and minimizes client access latencies. Both Freenet and Past use hashes, but hashes in these two systems achieve different objectives. In PAST, by using hashes, nodeId is location-independent, ownership- independent, and network connectivity-independent. Therefore, a set of nodes with adjacent nodeIds is highly likely to be diverse in these aspects. The replicas of a file stored in those nodes therefore are unlikely to be subject to correlated failures. In Freenet, a node tends to store files with similar file keys. And because keys are derived from hashes, lexicographic closeness of keys does not imply any semantic closeness. Therefore, files with close subject are likely to be scattered across the network, lessening the chances that failure of a single node will make all files with related subject unavailable. --SHARP:An Architecture for Secure Resource Peering SHARP is a framework for secure distributed resource management, resource control, and resource sharing across sites and trust domains. The control over resources is maintained by resource claims. The holder of a claim has the control of some resources over some time interval. There are two kinds of claims, 1. Tickets (soft claims). 2. Leases (hard claims). The distinction between them preserves local autonomy and local control over resources. The tickets can be oversubscribed and that makes tickets probabilistic. However, oversubscription can improve resource utilization and accountable ticket delegation can protect users against fraudulent agents. The security mechanisms provide authentication and authorization on access to shared resources and protect against a variety of attacks. Finally, the PlanetLab implementation demostrates the generality and efficiency of SHARP. --Scalability of Reliable Group Communication Using Overlays This paper studies the scalability of overlay group communication when TCP is used for providing reliable content delivery. Three key features are examined with respect to the size of the group: 1. Throughput. 2.Delay of packet .3.Buffer requirements at the end-systems. First, with unconstrained buffers at end-systems, the existence of a strictly positive group throughput is proved. Then it is shown that rate control combined with TCP congestion control mechanism allows one to bound the buffer occupancy and the latency for any arbitrary group size. From: Jigar Harshad Doshi Sent: Tuesday, September 21, 2004 2:04 AM To: Indranil Gupta Subject: 598ig review 09/21 The review was written after discussion with Ellick Chan (emchan) PAST: PAST is a file replication system which is built on top of Pastry. It provides three operations namely insert, lookup and reclaim. Each node is allocated a quota and nodes are allowed to specify a parameter k which decides how many times the file will be replicated. Files are not deleted they are reclaimed, which does not guarantee that the file will disappear from all nodes. PAST uses the routing substrate provided by pastry which is similar to Chord which means that a lookup takes around log 2^b (n) steps. Pastry also takes care of churn in the number of users joining and leaving the system. PAST uses hashed file and node ids like the other p2p systems. The key difference being that the file id space is bigger than the node id space and hence is truncated to fit the node id space. Then a file is stored at the node n and k of its neighbors. This simple scheme by itself is not very effective and its performance is very bad. Thus PAST introduces replica diversion and caching. Security mechanisms are introduced and every node is protected by a smart card based certificate. Storage imbalance occurs often in systems like past even though the file and node ids are uniformly distributed. This is because of imbalance in size of the files to be stored and statistical variations in the distribution of the ids as well as node storage variance. PAST introduces a number of mechanisms to take care of this. If the published capacity varies greatly from the normal, then each node registers as a number of node ids. Also files are diverted to other nodes if the current node is full. This is done under an admission policy and is controlled by a threshold t. Also file insertion may fail especially since PAST discriminates against big files. In this case the salt is varied and the insert operation is retried. PAST also proposes background diversion to take care of churn in the network .Caching by Greedy Dual Size policy is explored as an option to improve performance. The performance comparison lacks an analysis of the pastry overhead and the traffic in the system generated. The ananlysis shows that while PAST strives for gracefull degradation of performance. Near the maximum utilization point the performance degrades exponentially. Its is shown that caching in PAST greatly improves performance. While PAST represents a great attempt at building a replication system. It can be improved in various ways. Different caching policies can be explored because it is a key determinant of the performance of PAST. PAST can also be structured over a protocol like Kelips which guarantees a faster lookup time. The concept of death certificates can be introduced in order to make sure that a file is eventually deleted. The paper also lacks an explanation of the quota management but that is described in another submission. Scalability of Reliable Group Communication using Overlays: This paper gives a theoretical framework for understanding and analyzing the performance of an overlay based multicast scheme. IP based multicast schemes have been known to have vanishing throughput as the size of the group scales to infinity. On the other hand overlay based group communication protocols are shown in this paper to have a positive group throughput. One of the key assumptions of this paper is that buffers are infinite. The scalability of packet delay and latency are then shown to explode with time if the proposed throttling mechanisms are not implemented. First the authors consider a chain topology and then extend the work to other generalized topologies. The paper borrows heavily from F Bacceli and D Hongs paper on TCP being a max plus linear and they apply the algebraic techniques in order to deal with the overlay network. But instead of matrices the authors use weighted modern graphs. A key result of this paper is that an overlay is shown to have a throughput equal to the minimum of local maximum TCP throughput of overlay edges. Some of the key results from this paper with respect to overlay network design is that forwarding paths should be chosen such that the bottleneck overlay edge’s throughput should be maximized. The authors present an algorithm for doing the same first assuming the access link is not the bottleneck and then accounting for the bottleneck in access link. Also a forward rate control mechanism is proposed which is more efficient than the feed back based algorithms. Some of the issues which need to be addressed include the scalability issues of back pressure algorithms. SHARP : Sharp is an architecture for secure resource peering. It provides mechanisms for the advertisement and bartering of claims. These claims are cryptographically secure so there is always accountability of the peer claims. The system is highly flexible and allows a combination of local and global policies. It also allows an admission control mechanism such that any particular node is not heavily overloaded. The key aspect of SHARP is that it is completely distributed and does not use any kind of central resource management. SHARP uses the concept of agents and service managers in order to effectively barter and trade resources. The key construct is a resource claim which allows principals to control resources. This is cryptographically secure and non repudiable. Service managers use this mechanism to obtain tickets and leases. SHARP allows oversubscription of tickets to support decentralized resource management. Oversubscribed tickets may be rejected probabilistically. The key protection against attacks is that all claims in SHARP are easily traceable and cannot be spoofed. So even if a malicious node attacks the system, its identity will always be known. SHARP also uses economic measures in manipulation of tickets. There may be brokers and banks to deal with tickets. The authors have evaluated the system by implementing an instance over the planet lab testbed and the results are found to be very promising. The key advantage of the protocol is that it is very scalable. One of the key problems with SHARP however will be trust. Though SHARP makes sure that evry resource claim is traceable to an owner, the problems with compromising nodes will remain. In this case a black listing mechanism must be used. From: Pradeep Kyasanur [kyasanur@crhc.uiuc.edu] Sent: Tuesday, September 21, 2004 12:22 AM To: Indranil Gupta Subject: 598ig review 09/21 Submitted by Pradeep Kyasanur 1. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, A. Rowstron et al, SOSP 2001 PAST is a peer-to-peer storage management architecture that is implemented over Pastry routing scheme. PAST probabilistically guarantees balanced distribution of files among nodes that are geographically diverse by using uniform node and file identifiers. PAST includes explicit support to specify and account for diverse storage capacities available at different nodes in the peer-to-peer system. PAST inherits the locality properties from Pastry that enables a client request to be satisfied by nearby nodes. PAST adds caching support to ensure files that are more popular can be cached around the network improving fault-tolerance as well as performance. The evaluations in the paper demonstrate that PAST is largely successful in using local nodes for replies, and is capable of handling nodes with diverse storage capacities and requirements. While PAST seems to be efficient in inserting files into the storage system, removing files is hard (and unreliable). Thus, files in PAST may have infinite lifetime. In addition, unlike approaches like Freenet, there is no support in PAST for specifying versioning information about files, thus PAST is not useful for many publish-subscribe applications. 2. SHARP: an architecture for secure resource peering, Y. Fu et al, SOSP 2003 SHARP is a framework for secure distributed resource management. SHARP is designed to allow resource management in the absence of a central management authority, as is the case in many loosely coupled distributed systems such as PlanetLab. SHARP uses a security mechanism called "resource claims" to represent and control access to resources. There are multiple participating entities in the resource management protocol. The resources at each site are controlled by a site manager that maintains information about the status of each resource. The site manager runs a resource scheduler to enforce resource assignments. Clients requiring resources contact special agents which allocate resource claims. Claims are cryptographically signed and can be passed from one user to another. In the first phase, a claim called the ticket is ensured which does not guarantee a resource will be available. The claim needs to be converted into a lease by the site authority owning the desired resource for attaining guaranteed access to the resource. The paper shows that oversubscribing resources (by issuing multiple claims for the same resource) improves resource utilization by statistical multiplexing. Experiments of the framework on PlanetLab show that the framework can successfully support multiple resource allocation policies. SHARP requires a key management infrastructure to manage the integrity of keys used to cryptographically protect resource claims. In addition, since SHARP used asymmetric cryptography, it may introduce considerable computation burden on participating nodes. 3. Scalability of reliable group communication using overlays, F. Baccelli et al, Infocom 2004. This paper shows that reliable group communication can be implemented to scale well. In contrast, past results have shown that IP multicast does not scale well. The communication paradigm assumes a root node that is multicasting data to nodes in the peer-to-peer network using an overlay tree. Reliability is assured by using separate TCP connections on each overlay link, and providing sufficient buffering at each overlay nodes. For this paradigm, the paper shows that a non-zero throughput can be achieved even with large networks. However, although the throughput supported is non-zero in this case, delay and buffer occupancy becomes very large. The above algorithm assumes that the sender (root of the overlay tree) does not perform any rate control. However, if a suitable rate-control algorithm is implemented at the root node, then the buffer requirements and latency of the system are bounded. The paper assumes that multiple TCP connections emanating from an overlay node do not affect each other. While this analysis may hold when overlay traffic constitutes a small fraction of the internet traffic, it will certainly fail if a large number of such overlays are built over internet. The study does not consider the failure of nodes on the overlay which may drastically reduce the throughput on the overlay. Also, for the purpose of analysis, simple TCP growth equation is used which does not correctly account for timeouts, etc. From: Christopher W Banek Sent: Monday, September 20, 2004 11:54 PM To: Indranil Gupta Subject: 598ig review 09/21 Chris Banek 09/21 CS 598ig I feel all three of these papers present interesting viewpoints and considerations on many different types of distributed systems. File management and storage for later retrieval, as well as secure system management are important issues in the design of any high level distributed system. The article talking about reliable group communication is also terribly important, although a bit lower level and more mathematical, involving the statisical properties of message delivery. Of these articles, I feel that the secure resource management system SHARP presents the most interesting solutions and ideas. Instead of providing a system to solve one particular problem, such as queueing jobs on a cluster, or protecting copyrights on digital music, it provides an entire framework to work with. I think it also could provide an interesting solution that would make unused CPU cycles on home machines a commodity. Imagine your computer (left on of course) that has a broadband internet connection selling its spare CPU cycles to large distributed jobs. You could register with many different online CPU time "banks," giving a certain rate per cycle to run large batch jobs. Also your locality on the internet may come into play as businesses or groups looking for processing power look for large groups of computers with low latency to work together (like on the same DSL central office, or the same cable modem LAN). The article on storage management is also very interesting, providing a way to store files at different points on a network, using locality metrics to make transfers more effiecent. With some of the techniques they talk about, file sharing programs could be more and more widespread. Some of the issues contained not talked about though are problems with digital rights management (would having an illeagl file stored on your physical hard drive due to being inserted into a distributed system be illegal?). It would be interesting to have safeguards in either a don't-ask, don't-tell policy (aka, not let anyone on the machine read the data, or keep it in an unreadable format), or possibly set up rules for what types of files/size of files you would like to store. For music, this might become even more involved, as this would be a way for people to expand their collections, and would maybe only want certain genres of songs. Out of all these articles, I will admit I didn't understand much of the last paper. I think that although it makes sense at a theoretical level, the mathematics make it very confusing. Although I realize the mathematics are necessary in order to do the proofs they are talking about, I still feel it was the weak paper of the three. From: Romit Roy Choudhury [croy@crhc.uiuc.edu] Sent: Monday, September 20, 2004 11:32 PM To: Indranil Gupta Subject: 598ig review 09/20 Storage management and caching in PAST -------------------------------------- PAST is a peer to peer storage system in which the concept of overlay networks is utilized to maintain (insert, lookup, delete) k copies of a file. Distributing k copies of a file over the entire network (which is geograohically separated in the case of internet) can be a useful form of backing up, or might be a effective form of making a data available close to a client. The basic idea with PAST is that nodes are hashed (using SHA-1) to a ID space; files are also hashed to fileIDs. Any file is then routed to the k closest node IDs with respect to the file ID. The PASTRY routing mechanism is used to route these files in the overlay network. While retrieving a file, the k closest nodes to the corresponding fileID can be checked (in increasing order of routing cost) and the file copy can be obtained. File diversion is a mechanism to manage storage space -- when k designated nodes fall short of storage capacity, each of these nodes try to store the files at its leaf nodes. Caching is another idea introduced in this paper, whereby nodes that route a file keep a copy in the cache. A GDS mechanism is used for file eviction -- based on a metric C/S, where C is the cost and S is the file size, the least metric file is replaced by a new entry. A issue with PAST is that related files can get separated geographically, causing a client to add to the network traffic. While caching can alleviate this problem, it will not resolve it in the face of large storage traffic. Scalability of Reliable Group Communication Using Overlays ----------------------------------------------------------- This paper studies the scalability properties of group communication using overlay networks, on a TCP based internet backbone. The paper analyses 3 features of such overlays with respect to size -- group throughput, packet delay and buffer occupancy at each node. They develop mathematical models that capture the properties of the system, and were able to show that there exists a critical rate of injecting data at the sources, for which bounded latency and buffer occupany can be guaranteed. While most of the analysis is mathematical, the implication of the results can be summarized as follows. The key reason for differences in performance of IP multicast and overlays can be explained by the noting that in IP based communciation, each node in the networks is delayed because of waiting for ACK packets from its downstream nodes. However, using overlays, this is not necessary and each of the communication paths between overlay nodes can operate at different speeds given given some constraints of buffer capacities. As a direciton to protocol designers, the paper suggests that protocols need to be aware of the rates of paths between overlay nodes, the paths must be selected efficiently so that the maximum of the minimum-throughput paths are selected. And to do the latter, the bottleneck rate needs to be identified. Using these observations, protocols may be modelled to achieve non-vanishing througput even at large group size. SHARP: An architecture for secure resource peering -------------------------------------------------- An architecture is presented that allows scheduling and allocation of resources from a large network of computing elements. The resource management is performed in a secure manner. The basic architecture of SHARP may be summarized as follows. Agents are developed to arbitrate access to the resource through ticket exchanges. The tickets are probabilitic guarantees that the ticket-holder would be able to obtain a slice of the computing resource. On obtaining the ticket, the ticket-holder authorizes with the resource site, and claims the resource for a pre-specified period of time (like leasing). Tickets may be shared between multiple requestors. Generating false tickets can be handled by means of requiring to include the ticket generators ID and its parent's id, in the ticket. Later, if a situation of over-provisioning arises, the tree of ticket-genration process can be back-tracked to determine the culprit. The culprit can then be penalized. The architecture has been implemented on PlanetLab, and performance graphs have been presented to show that SHARP can increase resource utilization. From: Charles Yang [cmyang2@gmail.com] Sent: Monday, September 20, 2004 9:44 PM To: Indranil Gupta Subject: 598ig review 09/21 Charles Yang 9/21 Review * Storage management and cahing in PAST, a large-scale, persistent peer-to-peer storage utility. A. Rowstron & P. Druschel. PAST is a peer-to-peer storage utility built on top of the Pastry substrate. It allows for insertion, retriving, replication, and caching of files. Client nodes may initiate or route requests to insert or retrieve files, as well as contribute storage space. Files are looked up via a uinique fileID and are immutable; PAST is designed for archival and distribution use, not as a general file system. Files may also be reclaimed, that is, they may be lazily deleted from the system. Since it's built on top of the Pastry substrate, insertions and lookup times are O(log n). Nodes are assigned a 128-bit identifier in a circular namespace, and fileID's map to these nodes. Requests are routed to a "network local" node, but because of the way Pastry assigns node ID's, network local usually means regionally diverse. The main contribution of the paper is using storage management and caching to increase storage utilization and minimize routing. Files are replicated among network local nodes (which may be regionally diverse), and great effort is taken to ensure that storage usage is balanced among the nodes within the system. Storage managemnet in the form of replica and file diversion allow nodes to request other network local nodes to handle their own storage requests. Experiments have indicated that storage management is highly effective (60.8% -> 94-98%). Caching also leverages "unused" storage so that popular files can be located near where they are frequently requested. Some issues not investigated are the robustness of the system with regards to failures and churn. While they indicated that a file will most likely be available, they do not address the impact on the overall system. Another issue is, if there are a few nodes contributing a large percentage of the storage, what happens when those suddenly fail. * SHARP: an Architecture for Secure Resource Peering. Y. Fu, J. Chase, B. Chun, S. Schwab & A. Vahdat. SHARP is a framework fordistributed resource management. It allows for the decentralized allocation, sharing, validation, and caliming for distributed resources such as memory, processor usages, etc. It depends on public key cryptography in order to sign, validate, and authorize. The model does not depend on a central certificate authority, but each of its claims are self-certifying. In the paper, the authors implemented an instance of SHARP in Python on PlanetLab. The main contribution of the paper is with it's work with the allocation and decentralized management of scare resources is the idea of timed ticket and lease. A ticket is a soft-claim (probabilistic) on a set of resources that may or may not be granted by the actual resource provider. A lease is a hard claim (guaranteed) on said resources. The idea is analogous to seat bookings on aircraft: a ticket promises a seat, but in order to get on, one needs to exchange a ticket for a boarding pass. Ticket resources may in turn be delegated to a third party. Tickets are also only valid for a limited term. Oversubscription of tickets may be allowed in order to maximize utilization of resources. Again, it may be likened to airliners overbooking their flights. * Scalability of Reliable Group Communication Using Overlays. F. Baccelli, A. Chaintreau, Z. Liu, A. Riabov & S. Sahu. Instead of the traditional IP multicast, overlay networks have also been used to support multicast networking functionality. These have the benefit that it is not necessary for the underlying networking equipment to support IP multicast. In the paper, the authors provide a mathematical framework to address scalability of overlay multicast communication using TCP. They discover that irrespective of group size and behavoir of the underlying network, there exists a strictly positive group throughput. More specifically, the network throughput is lower bounded by the minimum of the local maximum throughputs. Secondly, they show that if a origin node throttles to a critical rate (the minimum of the local maximal throuputs), one can guarantee bounded buffers and latency at all nodes in the overlay. In other words, rate control with TCP congestion control is sufficient for throughput and buffer occupancy scalability. Implications fo protocol design are thus: 1. the overlay tree should be designed so the "weakest" link is made as strong (largest throughput) as possible. According to the paper, one of its main scientfic contributions is establishing a general link between scalability of reliable overlay multicast and the properties of hydrodynamic limits encountered in certain models of statistical physics such as percolation and particle systems. As to practical contributions, it's nice to have a mathematical backing that one should try to strengthen the weakest link, even in overlay multicast. From: Dennis Y Chi Sent: Monday, September 20, 2004 9:24 PM To: Indranil Gupta Subject: 598ig review 09/21 P2P Apps PAST PAST is a peer-to-peer (p2p) utility that provides persistent content location with high availability. The organization of nodes and routing of requests are handled by Pastry, a well-known p2p system. Pastry nodes contain routing tables that are organized based on common prefixes and locality metrics, and leaf sets that contain a subset of nodes with the closest matching identifiers. File insertion in PAST requires that k nodes closest to the file id store the file. Since Pastry provides locality, lookups usually will find the closest of the k nodes replicating the file. Although the hashing of nodes and files to identifiers help balance the load at each node, problems still exist if file sizes and storage at each node vary widely. Therefore, PAST uses replica and file diversion to manage storage and utilization. If one of the k nodes cannot insert the file, then it will choose a node from its leaf set that isn’t among the k closest to store the file (replica diversion). If this newly chosen node cannot store the file, then the client must rehash the file to insert it in a different node id space (file diversion). Also, intermediate nodes cache the files that they forward during lookups/inserts. The paper provides results that show that replica and file diversion greatly improve the success of insertions and storage utilization, and that caching improves the speed of lookups, especially during low utilization. Although the results show that PAST performs fairly well, the tests themselves were pretty lightweight – only 2250 nodes with a mean file size of ~10Kb. For a system that is supposed to be an “archival storage and content distribution utility”, more substantial test cases should have been used. Also, the authors could have done more work regarding the file encoding, determining if fragmenting a large file was advantageous in PAST. The paper seems to contradict itself regarding freeloaders – early in the paper it states “Optionally, nodes may also contribute storage to the system” but later in it states “If a node [advertised storage capacity] is too small, it is rejected.” Although Pastry provides O(logN) lookups, it seems that it would have a high message overhead to maintain the state information. Also, Pastry seems very similar to Chord, since Chord replicates files at successors and predecessors. If Pastry was chosen because it provided locality, then the paper’s caching performance results could have included delay, rather than number of hops, in order to highlight the benefits of building on top of Pastry. SHARP SHARP (Secure Highly Available Resource Peering) is an architecture designed to manage, control, and share networked computing resources securely and effectively. This framework consists of different entities exchanging claims – promises or rights to possess resources for certain amounts of time. Each user application has a service manager that requests a ticket (a claim that does not guarantee resource ownership) from an agent, who is delegated control of a set of resources. If the ticket is granted, then the service manager redeems the ticket with the site authority and receives a lease (a claim that guarantees resource ownership) if it can instantiate its service at that site. Agents have the freedom to distribute tickets and allocate resources however it chooses, and can oversubscribe tickets that exceed actual resources held by site authorities. Oversubscription improves resource utilization if service managers do not use all of their leased resources and resource availability if tickets are replicated between agents in case agents fail. In addition, the agents maintain routing tables using a Secure Resource Routing Protocol, which is similar to OSPF. Because SHARP is just like any other ‘computational economy’, authentication, integrity, and non-repudiation measures are used to ensure that entities do not act maliciously and are accountable. The paper provides performance results that mainly show how oversubscription improves resource utilization if offered load increases and if failures occur. This paper is interesting because it describes a fairly generic infrastructure, so agents and site authorities can implement different policies to achieve different goals (i.e. provide some quality of service resource guarantees to the applications). Also, the issue of accountability was well-addressed – agents are ultimately responsible for whatever tickets they issue. But the paper didn’t really address the issue of freeloaders or synchronization. SHARP assumed that clocks are closely synchronized, but that is difficult to ensure in a distributed environment. Scalability of Reliable Group Communication Using Overlays End-system multicast involves clients organizing themselves into an overlay and using point-to-point connections to communicate with each other. This paper analyzes the scalability of throughput, delay, and buffer requirements of the system when TCP connections are used. Assuming that buffer sizes are infinite, it is proven through theoretical modeling and experimental results that group throughput is equal to the minimum of the local maximum throughput of TCP connections at the overlay edges. The local maximum throughput of a TCP connection is the measured throughput when the sender node of the TCP session is driven by an infinite source. This result shows, regardless of the group size, that group throughput remains positive, unlike IP multicast. The previous result assumed that buffer sizes are infinite, so the paper goes on to show that buffer size and latency can be scalable. Based on the ergodic and central limit theorems, latency will converge to infinity if the source does not throttle packet emission. Further theoretical and mathematical analysis and simulation results show that latency and buffer occupancy converge as the depth of the overlay tree goes to infinity, as long as the throttling rate is bounded by the group throughput. Therefore, to improve performance, the overlay tree should be constructed appropriately as to maximize the minimum of the local maximum throughput of its edges. The experimental results in section 4.D supported the theoretical analysis, and it is interesting that by simply utilizing a rate control mechanism that buffer utilization can be kept very low and all the edges in the overlay graph can achieve the same throughput, considering the high variance in their maximum link throughputs. The paper could have showed the instability of the system when transmission rates were higher than 9.75kbps. It will be interesting to see how the results of this paper are received, and how they affect the design of future protocols and algorithms for end-system multicast. From: Ellick M Chan Sent: Monday, September 20, 2004 8:34 PM To: Indranil Gupta Subject: 598ig review 09/21 Ellick Chan CS598IG 09/20/2004 SHARP is a system that allows for secure highly available resource peering. Sharp allows server peers to better utilize their resources by providing a system to securely delegate resource tickets, which promise computational time slices, to other nodes. This behavior prevents catastrophic behavior such as an overloaded resource, which only exacerbates the problem when users request more often under these conditions. Sharp tickets are designed to work like airline tickets. The tickets have the following properties: 1) Non-forgeable 2) Tradable 3) Expiration 4) Accountability 5) No guarantee of resource availability The non-forgeable tickets may be traded by brokers to allow other nodes to use certain resources. To remain accountable, when a ticket is traded, a new one is created and signed with the appropriate private keys, which allows a chain of delegation to be stored with each ticket. When a node wishes to use a resource, it trades a ticket for a lease, which guarantees availability according to the XML- specified rules of a ticket. An oversubscription model is used to enhance efficiency, but this means that not all tickets may result in immediate leases. Much like modern day trade systems, Sharp can suffer the same seller credibility problems as eBay. The Sharp system relies on adjudication by a third party when individual nodes fail to behave responsibly. Sharp’s accounting policies help record proof of misuse, but the system does not provide reactive policies to actively limit the damage incurred. One way to fix this would be to try to provide a trusted gossip system. The Sharp system does not take into account multi-node collaboration. Since Sharp manages computational resources as well as physical resources like instruments or sensors, there are certain tasks that would take much longer in the absence of resource claims on several devices simultaneously. The secure resource routing protocol creates a chain of delegation to access of a resource. This somewhat myopic approach creates inherent unreliability if a process fails, the protocol must route around it, causing slowdowns. A different approach might use distributed hash tables like Chord or Kelips. PAST is a system that peer to peer file storage. It provides stronger availability guarantees than systems like Freenet or Gnutella, by ensuring there are k replicas of a given file(enforced by storage receipts). Simultaneously, its quota system prevents flooding of the network by limiting the individual number of files stored by any node. Past stores files by creating a hash of the file based on several identifying attributes. The replication stage ensures that copies are made at random according to nodes, which most closely match the randomized file hash. To address locality, Pastry keeps two tables, one of the nodes, which are l/2 closest by node id, and another table called the neighborhood set, which is maintained by locality metrics. Past allows nodes to maintain file diversion links (indirection to another node) in several cases, such as lack of storage space, to help balance the overall distribution of files within a network. Caching also helps Past maintain speed of operations; the policy used is GreedyDual-size. The main weakness of the paper is the lack of description of the quota protocols and the reclaiming process. The author does not specify how a global/local quota may be maintained, and the agreement protocols involved. The system simulations indicate that it will scale gracefully until the system is nearly fully utilized. However, all tests were simulated. Scalability of Reliable Group Communication Using Overlays: This paper shows that under certain conditions, overlay networks have advantages over IP-based multicasting. The authors mathematically model and show that irrespective of the group size, and behavior of the underlying network, there exists a strictly positive group throughput. The authors propose a pro-active rate throttle on the sender to maintain these properties. The condition for scalability is: when the local maximum TCP throughputs are strictly positive, the overlay is scalable. Rate control combined with TCP congestion control provides scalability in both buffers and throughput. The results of the paper suggest certain guidelines to follow, which will guarantee the scalability of a multicast network. In reality, it may not always be possible to meet these conditions, and the adversarial case is usually the norm. For example, if a back pressure is applied when buffers are nearly full, this may affect the throughput of the system. It is interesting to note that finer grained control over the multicast parameters using an overlay network can result in better performance, even though an overlay network by definition is not necessarily more optimal than the underlying network. However, the ideal conditions to form a scalable overlay may not always be possible, and it would be beneficial to see some actual results versus normal IP multicast instead of the simulation, especially in adversarial cases. From: Vartika Bhandari Sent: Monday, September 20, 2004 5:29 PM To: Indranil Gupta Subject: 598ig review 09/21 Storage management and caching in PAST , a large-scale persistent peer-to-peer storage utility This paper describes a p2p persistent storage system called PAST. PAST uses Pastry as the underlying routing substrate. In Pastry, nodes possess 128-bit identifiers (obtained via SHA-1 hashing). File-IDs are 160 bit long. A file is replicated on k nodes with identifiers numerically closest to the 128 MSBs of the file-ID. This allows for high availability. Routing takes 0(log n) time and 0(log n) state is required per node. PAST generates file identifiers using SHA-1 hashes. The file (which is signed) is then appropriately routed to the replicating nodes. There is a notion of client storage quota, and it is also updated accordingly. PKC is used for security. PST resolves load-balancing issues by allowing replica-diversion whereby a file may be located at a node not numerically k-closst which belongs to the leaf-set of one of the k-clossest nodes. Caching is also employed to reduce access latencies. A GreedyDual-Size replacement policy is adopted. A prototype implementation has been built using Java. The issue of how files are assigned weights for cache-replacement purposes is not explained. Besides, it is not clear how the "client quota" mentioned in section 2.2 can be effectively (and dynamically, with changing available storage due to churn) determined in a decentralized way. SHARP: An architecture for Secure Resource Peering The focus of this paper is on providing a framework for secure sharing of distributed resources e.g. in a computational grid. The objective is to allow users to obtain resources in a predictable manner, prevent resource-pilfering, and provide fairness characteristics, within a decentralized framework. SHARP operates using a notion of tickets which embody a non-binding promise of resource-availability.Tickets are issued by agents, and over-subscription is allowed to achieve high utilization. Individual site authorities retain the right to reject a ticket based on the then-current situation. Tickets are traceable back to the issuing agent thereby allowing for accountability. Delegation of acquired resources to another holder is allowed. SHARP primarily focuses on the security and trust aspects of resource peering. Scalability of Reliable Group Communication Using Overlays This paper studies the scalability of overlay-based multicast where the multicast distribution graph comprises individual host-to-host TCP connections. A tandem network is considered and an analytical model is obtained using max-plus algebra. Infinite buffer space is assumed at each node. The authors show that without control of rate, at least one station will show unbounded increase in queue length and hence delay, while with rate-control, there is a stable finite delay. The authors also discuss some implications on protocol design. One observation is that the bottleneck link capacity be maximized, which is intuitive. Overall, the paper states intersting results. However some more intuitive explanations would have helped make the math clearer. From: Moosa Muhammad Sent: Saturday, September 18, 2004 10:19 PM To: Indranil Gupta Subject: 598ig review 09/21 ------------------------------------------- Written By: Moosa Muhammad For Presentation Dated: 9/21/2004 (1) Review of "Storage management and caching in PAST, a large-scale persistent peer-to-peer storage utility" This paper presents and evaluates the storage management and caching in PAST, which is a large-scale p2p persistent storage utility. PAST is based on a self-organizing Internet-based overlay network of storage nodes that cooperatively route file queries, store multiple replicas of files, and cache additional copies of popular files. The purpose of PAST is to provide archival storage and content distribution mechanism, and is not a general-purpose filesystem. To retrieve a file, a client must know the fileID and decryption key (if needed), as it does not provide facilities for searching or directory lookups. The reason for this is that it uses Pastry (these facilities are under research for Pastry), as an efficient routing scheme. Client requests to retrieve a file are routed, with high probability, to a node that is "close" (based on a scalar metric such as number of IP routing hops, geographic distance, etc.) to the requesting client. NodeIDs and fileIDs are assigned in a similar fashion to Chord. To insert a file, it is stored on k PAST nodes whose nodeIDs are numerically closet to the 128 most significant bits of the 160-bit fileID. This ensures that with high probability, the k replicas are stored on a diverse set of PAST nodes. For popular files, caching is provided by storing additional copies of files in the unused portions of PAST node's local disks. PAST's storage management aims at allowing high global storage utilization and graceful degradation as the system approaches its maximal utilization. The primary responsibilities of the storage management are: (a) Balance the remaining free storage space among nodes in the PAST network as the system-wide storage utilization approaches 100%, also called replica diversion (b) To maintain the invariant that copies of each file are maintained by the k nodes with nodeIDs closet to the fileID In the first experiment that they conducted, both replica diversion and file diversion were disabled by setting the threshold for primary replica stores to tpri=1, and setting the threshold for the diverted replica stores to tdiv=0. This resulted in 51.1% of file insertions to fail and the global storage utilization of the PAST system was only 60.8%, showing that there is a need for storage management in PAST. By having file and replica diversion enabled, the storage utilization rose to about 95%. They also concluded that as tpri or tdiv was increased fewer files were successfully inserted, but there was higher storage utilization. Some comments and possible future extensions to this project (or the Pastry project in general) are to add the searching and directory lookup mechanisms to it, to make it into a more practical application for p2p file sharing and storage. Another one is to create some sort of a hierarchical storage management, in a way mimicking the UNIX file directory structure and compare the performance of such a system with this current implementation. Also come up with some informed heuristics to reduce the number of replica/file diversions. (2) Review of "SHARP: An Architecture for Secure Resource Peering" This paper presents a new approach to secure and flexible resource management for wide-area networked systems, through the design and implementation of SHARP. SHARP enables the user to control resources for designated time intervals, through the use of cryptographically protected resource claims. It also provides secure mechanisms to distribute claims across a network of resource managers, leading to resource peering where sites can trade their resources with peering partners or contribute them to a federation according to local policies. Through separation of claims into tickets and leases, local control over resources and site autonomy is being preserved also. Systems such as PlanetLab and Grid share networked computing resources (under coordinated control) for performing computationally intensive calculations. Such systems need an effective resource management for fair sharing of resources, which SHARP can provide for them. The general mechanism for sharing resources among sites is that sites delegate control over their resources to agents. Service managers obtain resources at a SHARP site using a two-phase process. In the first phase they obtain a ticket from an agent, which does not guarantee resource ownership (due to probabilistic resource allocation). The second phase consists of the service managers presenting this ticket to the appropriate site authority to redeem it; if approved, they will receive a lease for any subset of the resources or term specified in the ticket. Some of the contributions of this paper include, defining self-certifying secure claims to global resources that can be safely shared across trusted domains and demonstrating the practicality of their approach by running it across the PlanetLab wide-area testbed. In the future, a few extensions to this project can be made, for example a full exploration of policies for resource management and oversubscription policies for replicated ticket sets. They did a failure analysis of this system, by killing one agent at a time. An extension to this might me to kill multiple agents at a time, to observe the behavior of the system and the performance implications that arise from such a scenario. (3) Review of "Scalability of reliable group communication using overlays" Up until recently, a lot of work had been done in reliable group communication, by means of IP supported multicast schemes. These techniques have major limitations, among them are scalability and the lack of wide spread deployment of IP multicast in the Internet. Recently, reliable multicast has been implemented in overlay using point-to-point TCP connections. This paper presents a mathematical framework based on the max-plus representation of TCP to address the scalability of overlay group communication when TCP is used to provide reliable content delivery. This is a major contribution since such a detailed performance analysis on this topic had been missing. They used theoretical investigations, experiments on the Internet, and simulations of large networks, to examine the scalability of three key features of such overlays with respect to size of the group, which were: throughput of the group communication, delay of packets to reach end-systems, and the buffer requirements for the end-systems. From their analysis, they derived the following important results: a) Irrespective of the group size and the behavior of the underlying network, there exists a strictly positive group throughput. Also, they established the maximum possible throughput achievable for a set of receivers and the conditions that are required to achieve it. b) They analyzed a pro-active mechanism which throttles the sending rate of the source, and showed that there exists a critical rate such that when achieved, one can guarantee bounded buffers and latency at all the nodes in the overlay tree. Some future ideas for extending this work include investigating the scalability issue of the tree topology when the buffer of a node is full, resulting in the stoppage of communication from the upstream node. This can cause throughput degradation. A comment that I had regarding their analysis was that they did not mention the TCP parameters that they used in their implementation. For example, TCP failure model allows one to use timeouts and retransmissions to deal with lost packets. Also the mechanism that they used to deal with lost packets is not completely clear to me, by reading their paper. --------------------------------------------