From: Muyuan Wang [mwang2@uiuc.edu] Sent: Tuesday, February 14, 2006 1:23 PM To: Indranil Gupta Subject: 598ig review 02/14 Sorry, I mailed this to wrong address. Thanks, Muyuan -----Original Message----- From: Muyuan Wang [mailto:mwang2@uiuc.edu] Sent: Tuesday, February 14, 2006 9:00 AM To: 'indy@cs.uiuc.edu' Subject: 598ig review 02/14 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Summary: This paper presents and evaluates the storage management and caching in PAST, a large-scale peer-to-peer persistent storage utility. PAST is based on a self-organizing, Internet-based overlay network of storage nodes that cooperatively route file queries, store multiple replicas of files, and cache additional copies of popular files. It aims at combining the scalability and self-organization of systems like FreeNet, as well as providing the strong persistence and reliability, to make the system to be a successful archival storage system. The most distinguished feature of PAST is its storage management. 'Replica diversion' is used to accommodate differences in the storage capacity and utilization of nodes within a leaf set, and 'file diversion' is performed when a node's entire leaf set is reaching capacity. The storage management and caching system considers non-uniformity of storage node capacities, and non-uniformity of different files' popularity, i.e. to minimize fetch distance. The goal of cache management is to minimize client access latencies, to maximize the query throughput and to balance the query load. PAST resolves the conflict between system-wide storage utilization and maintaining the invariant that copies of each file are maintained by the k nodes with nodeIds closest to the fileId. The system is built upon Pastry, so it leaves the routing task to Pastry. Comments: The PAST system focuses on some extreme situations, i.e. the system wide storage utilization is approaching 100%. However, the paper does not give the statistics about how frequently will these situation be achieved in real-life systems, so it is some what doubtful how much better the PAST is, comparing with current systems. Another feeling is, since the PAST system is based on the Pastry, a peer to peer routing substrate, and some descriptions or implementations must be found in other literatures. Maybe that's one of the difficulties for me to tell and grasp the essential idea and techniques of this paper. ?????????????????????????????????????? ?????????????????????????????????????? ???????????? SHARP: An Architecture for Secure Resource Peering Summary: The goal of this work is to develop fundamental mechanisms to allocate resources across the system coordinatively. Basically the paper claims its three contributions: 1. they define self-certifying secure claims for safe global resources delegation. 2. a computational economy among individual trust domains is build, considering pair-wise arrangements. 3. They performed lots of experiments, on large scale testbed, to demonstrate the advantage of this system. The novel part of this paper is to use protected 'resource claims' to subdivide and delegate claims across resource managers. The secure mechanisms enable sites to trade their resources with partners or the whole federation. There are two types of claims, so called 'tickets' and 'leases'. They two together can allow coordinated resource management while preserving local control of resources for each site. Based on these mechanisms, they establishes a more decentralized structure in which multiple resource manages control different portions of the global resource pool, instead of using a trusted central resource manager that assumes ownership of all resources. Comments: Compared to the 'PAST' paper, I think this paper is better organized, and easier to understand. One possible future work is, when considering the effect of oversubscription in the case of the failures, it can be observed that there should be a tradeoff between the OD and the resource utilization caused by failures. Including this phenomenon, how to achieve the best tradeoff could be further explored. From: Andrew Badr [andrewbadr@gmail.com] Sent: Tuesday, February 14, 2006 11:35 AM To: Indranil Gupta Cc: Mike Earnhart Subject: Fwd: Final Copy Attachments: 02142006-1 FINAL.ppt here's our presentation ppt From: Bach Duy Bui Sent: Tuesday, February 14, 2006 9:33 AM To: Indranil Gupta Subject: 598ig review 02/14 PAST 1. Summary PAST is a peer-to-peer storage utility, which exploits the multitude and diversity of nodes in the Internet to achieve strong persistence and high availability. Any files inserted into PAST is quasi-unique since it is viewed by the network through a unique fileID. Each file is stored at multiple nodes with adjacent nodeIDs. The system is designed so that the high diversity of adjacent nodeID nodes is highly likely. File inserting and searching functions are built on top of an efficient, scalable, fault resilient and self organizing routing substrate: Pastry. By storing at each node a routing table containing nodes whose nodeIDs are at different matching level, Pastry is able to perform routing request in O(log(n)) time. Node failure and recovery rely on a soft-state protocol in which neighboring nodes in nodeID space periodically exchange keep alive massages. PAST also includes storage management and cache management functionality. The design of storage management aims at the goal of balancing unused storage space among nodes in a leaf set while respecting nodes' diversity of storage capacity. The goal of cache management is to increase the availability of very popular file, minimizing client access latencies and maximizing query throughput. 2. Comments Authors do not explain how a new node can get the address of "nearby" node when it tries to join the network. How this can be implemented in practice is important for self-organizing of PAST. Although by maintaining k replica PAST can increase files' persistency and availability, this maintenance makes system complicated. In a distributed environment, this maintenance requires much communication overhead. The tradeoff between the system persistency and the overhead to maintain it should be investigated. SHARP Sharp propose an approach to flexible secure resource management for wide-area networked systems. The goal of the system is to create a management system, which allow actors to reserve resources across the system for predictable behavior and prevents actors from stealing resources help by others. The system is designed to work in an environment where the actors may be mutually distrusting and subject to attack and subversion from third parties. SHARP comprises of (1) site authority: trusted principals who conferring ownership of the resource (2) agents: the entity who claims to have resource and possess a signature (public key) issued by a site authority (3) service manager: the entity who uses request resources. A principal claim for resource through a resource claim process in which the claim ticket is securely signed by site authority to protect the integrity of claims. A holder of claim may delegate some of the claimed resource to another principal in the way that relationships among claims are securely apparent. Claim expiration is used to control the resource availability. Oversubscription technique is also used in SHARP to improve the efficiency of distributed resource discovery and availability of resource. From: Juan Jose Jaramillo Jimenez Sent: Tuesday, February 14, 2006 9:26 AM To: 'Indranil Gupta' Subject: 598ig review 02/14 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility ---------------------------------------------------------------------------- -------------------- 1.1 Summary The authors present a P2P persistent storage utility named PAST and evaluate its storage management and caching performance. PAST is based on an overlay network atop of Pastry, a distributed object location and routing for large-scale P2P systems. PAST has the following set of operations: (1) insert a new file in the network creating a new fileId, (2) retrieve a copy of a file identified by fileId, and (3) reclaim the network storage capacity of the file fileId (in order to reclaim the user quota). The number of copies k of a file stored in the network is chosen to meet the availability needs of a file according to the node failure rate. Since the nodeId and fileId distribution is randomized, and since the size distribution of files is heavily tailed, it is necessary in PAST to devise a way to achieve some sort of load balancing. This is accomplished using replica diversion (a node that cannot store a copy of a file sends it to one of its leaf nodes to see if they can store it and then sets a pointer to where the file is actually stored) and file diversion (if an insert operation fails, the originating node can try to generate another fileId and send it to the new nodes to see if they can store the file). The idea of cache management in PAST comes from the need of making additional copies of highly popular files. Cache management thus tries to sustain lookup load while minimizing client latency and network traffic. Cache insertion is done inserting the file in a node that is part of the lookup route and has enough capacity to store it. Cache replacement is done deleting the oldest cache copies in order to make room for the newer copies. 1.2 Comments and ideas for future work * Each node and file is located in the network using a scheme similar to the one presented in Chord: for both, it is generated a node and file id using a hash function (SHA-1), and a file is stored in the nodes that are closest to the fileId. * As stated by the authors, the main problem with this scheme, and any other P2P storage system, is achieving load-balancing, since file size distribution is heavy tailed. Thus a plausible research topic is trying to find algorithms that achieve fair load-balancing in P2P systems. 2 CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups -------------------------------------------------------------------------- 2.1 Summary Most research today is focused on improving server-side infrastructure, and assumes that caching and redundancy in the client side is enough. However, the authors state that they found that most common problems in DNS lookups are due to intermittent failures in the local name-server, thus the motivation for the paper is trying to find a way to improve average lookup latency and DNS availability. The measurements done in PlanetLab by the authors show that although most local DNS lookup times are small, but often degrade dramatically, significantly increasing the average lookup times. Thus, a small percentage of failures dominate the total lookup times. They even show that this kind of instabilities is widespread and frequent, and therefore client-side problems are significant. Although the authors cannot be 100% sure of the origins for these failures, they pinpoint the probable causes: (1) packet losses in lookup messages, (2) local name-server overloading, (3) resource competition due because usually DNS services are run with other services in the same machine, and (4) maintenance repair times to restore DNS service. This motivation lead the authors to develop CoDNS, a cooperative lookup service where peer nodes help in the DNS lookup if the local name-server is currently experiencing performance problems: if a node experiences long lookup delays, it then relays its lookup to its neighbor nodes to see if they have in cache the answer to the query. 2.2 Comments and ideas for future work One of the main problems of this approach is peer trusting: if a malicious node makes a denial-of-service attack in the name-server, overloading it with dummy requests, then it can start spreading false lookup results that can send users to wrong addresses. This has legal implications, since this can be used to do phising, one of the biggest problems the banking industry is facing right now. Thus, an area of future work could be to develop algorithms to guarantee the validity of the information sent by peers. From: Anthony Cozzie [acozzie@gmail.com] Sent: Tuesday, February 14, 2006 4:05 AM To: Indranil Gupta Subject: 598ig review 02/14 Anthony Cozzie 1. CoDNS This paper points out that DNS exhibts a somewhat bimodal behavior: most of the time, the query is either answered in <100 ms or not at all/after quite some delay. Furthermore, most of the failures involve the DNS server (UDP buffer too small, cron job, server failure) or the direct connection (dropped packet) between them, so their is not much correlation between server failures. CoDNS works by simply dramatically reducing them timeout for DNS requests (5 seconds to 500 ms or so) and then forwarding a timed-out request to multiple servers (8 was found to work extremely well). The system is found to improve the latency considerably, and also to reduce the _variance_ of the latency which is also quite nice. Advantages: seems a nice improvement on DNS Disadvantages: why not just fix the stupid DNS server? most of the problems they describe are created by idiot sysadmins 2. Past PAST appears to be an extension of Pastry (also developed by M$). I'm not 100% sure how Pastry works, but it appears to feel sort of like a K-map; that is, request paths look something like 010110 -> 000110 -> 001110 -> 0001000. Pastry appears to have some advantages over Chord, especially in terms of taking advantage of network locality and in security, due to the fact that in Pastry there are multiple paths to a node. Anyway, PAST simply lies on top of Pastry and maintains files at the K closest nodeids rather than only 1. They also describe a system for breaking their K rule slightly (by 1 hop) in order to use the storage more efficiently. Advantages: seems secure and reliable to this graduate student Disadvantages: Joining the network, like in Chord, may be very expensive and require tens of GB in traffic. From: mbakht@gmail.com on behalf of Mehedi Bakht Sent: Tuesday, February 14, 2006 9:07 AM To: Indranil Gupta Subject: 598ig review 02/14 Netid: mbakht2 Title: Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Summary: The paper presents and evaluates the storage management and caching in PAST, a large-scale peer-to-peer persistent storage utility. PAST has been designed based on the four goals of strong persistence, high availability, scalability and security. With high probability, the set of nodes over which a file is replicated is diverse in terms of geographic location, ownership, administration, network connectivity, rule of law, etc. Pastry is used to ensure that client requests are reliably routed to the appropriate nodes. The number of messages exchanged while routing a request as well as the number of PAST nodes traversed is logarithmic in the total number of PAST nodes in the system. However, there is no provision for searching, directory lookup or key distribution. Where a file will be stored is decided based on quasi-unique fileId, generated at the time of file's insertion into PAST. In case of rare fileId collision, the file inserted later is rejected. Otherwise, PAST stores the file on the k PAST nodes whose nodeIds are numerically closest to the 128 MSBs of the file's fileId. Quasi-random assignment of nodeIds and fileIds ensures that the number of files stored by each PAST node is roughly balanced. Pastry repeatedly takes a locally "short" routing step towards a node that shares a longer prefix with the fileId. PAST's storage management scheme tries to achieve high global utilization and graceful degradation as the system approaches its maximal utilization. There seems to be a conflict between two goals. Requiring that a file is stored on k nodes closest to its fileId leaves no room for any explicit load balancing. This conflict is resolved by replica diversion and file diversion. Experimental results from a prototype implementation of PAST is given at the end of the paper. The results show a very high global storage utilization accompanied by a very low rate of failed file insertions. Comments: The provision for reclaim is not equivalent to a delete operation. So even if the author wants, he cannot delete a file with full guarantee. Isn't it a very undesirable feature? ------------------------------------------------------------------------ -- Title: SHARP: An Architecture for Secure Resource Peering Summary: This paper presents the design and implementation of SHARP (Secure Highly Available Resource Peering) - a new architecture for distributed resource management, resource control, and resource sharing across sites and trust domains. Central to this system is the idea of resource claims – which allows principals to formulate and exchange unforgeable assertions about control over resources in a two-phase scheme. In the first phase of obtaining a resource, a resource claim in the form of a ticket needs to be obtained from an agent. A ticket does not guarantee resource ownership but merely suggests it. In the second phase, this ticket is presented to the appropriate site authority. The authority may reject the ticket or it may honor the request by issuing a lease for any subset of the resources or term specified in the ticket. Encryption based mechanisms are used for validating tickets and protecting the integrity of claims. In order to increase efficiency and resource availability, SHARP uses two techniques. One is the use of soft-state claims of fixed duration which ensures that a failed entity will not forever block access to any resources held by it. Second is the provision for oversubscription – allowing site authorities and agents to issue tickets that exceed their resource holdings. Comments: Fairness is an important issue in resource management. There seems to be no explicit attempt to implement it in SHARP. -- Mehedi Bakht Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign From: Long Hai Vu Sent: Tuesday, February 14, 2006 8:50 AM To: Indranil Gupta Cc: Long Vu Subject: CS598IG-review 02/14 CS598IG: REVIEW PAPERS Date: 02/14/2006 Name: Long Vu Title: Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Short summary: In this paper, authors first propose a large scale peer-to-peer persistent storage utility called PAST, they then perform evaluation on its capability in term of storage management and caching. Like many other P2P system, nodes and files in PAST are assigned a quasi-unique key based on SHA1 hash function (with 160 bits and 120 bits for file and node respectively). They try to optimize the global storage utilization and make use of network proximity (i.e. scalar metrics such as the number of routing hops, bandwidth, distance...) to route the requests to desired resources. As stated, the popularity of files determines the number of their copies; that is, they minimize the fetch distance and balance query load. The peer-to-peer routing substrate called Pastry is also mentioned in this paper. Pastry routes a request to the node whose nodeId is closet to the request. Further information about Pastry: http://research.microsoft.com/~antr/Pastry/ Discussion - Insert operation: k is one of 4 input parameters of insert operation which is set by 5 in the paper with no explicit explanation. k is the number of replication of a file in PAST system. It is actually very important parameter as it influences directly the query load and fetch distance. Thus, setting k value becomes essential. If k is too small, the popular files are not stored in enough number of nodes. In contrast, if k is set too large, the unpopular files are redundant. Are there any ways to change this value statistically and adaptively? Currently, caching is used in this paper for popular files but it does not guarantee that the cached copies will be kept in adjacent nodes. Could we use a new insert mechanism to insert new copies rather than just keep cache? - What happens if PAST has high churn; that is, node joint and leave the system with high frequency? In this case, it is really hard to keep load balancing. For example, before inserting node C into PAST system, B is the closet node of A and B stores all copies of files in A (this way improves Pastry’s performance). Assuming that after inserting C into PAST, C is closer to A than B. This decreases system performance in the future as C may have less number of files in A compared to B. Title: SHARP: An Architecture for Secure Resource Peering This paper is concentrating on a central issue in the process of sharing data across a distributed system: how to manage distributed resources securely and create cryptography to protect these resources. In SHARP, peers could trade their resources with their partners or contribute these resources to a federation. There are three contributions presented in this paper: - defining self-certifying secure claims to global resources and allowing sites to trade them across trusted domains - proposing the idea of building a computational economy among trusted domains - evaluating their approach by large-scale experiments with SHARP prototype. One of the key points in SHARP is a resource claim: ticket and lease. A ticket presents a soft claim which may but does not guarantee the ownership of the resource. That is, the site with ticket may not be the owner of the resource. The service manager presents ticket to the appropriate site authority for redeeming. If possible, the authority would issue a lease for the resource. In contrast to ticket, lease is a hard claim. As a consequence, a site may oversubcribe its resources by issuing more number of tickets than the number of resource claims it could hold. This influences the efficiency of SHARP. To solve this, two techniques are proposed. The first one is soft state claim with fixed duration, but how to choose the claim duration? This is important as if the claim duration is too long or too short, resources are not used efficiently. The second allowing site to issue more number of resources than it is holding. From: ercanucan@gmail.com on behalf of Ercan Ucan Sent: Tuesday, February 14, 2006 7:09 AM To: Indranil Gupta Subject: "598ig review 02/14" Storage management and caching in PAST -------------------------------------------------------------- Problem Addressed: The authors of this paper present and evaluate the storage management and caching in Past, which is a large-scale peer-to-peer persistent storage system. In the Past, nodes that store the files and the files that are stored in the system are each assigned uniformly distributed identifiers, and replicas of a file are stored at the nodes whose identifier matches most closely the file's identifier. This approach approximately balances the number of files stored on each node. However, non-uniform storage node capacities and file sizes are an important issue in load balancing and it requires more attention in order to have a better global content distribution. Also, non-uniform popularity of the files requires caching in order to minimize the fetching distance and balance the query load. Past is based on Pastry, which is a self organizing, Internet based overlay network of storage nodes. Approach Taken & Comments: The Past is consisting of nodes that are connected to the Internet, and each node has the ability to initiate and route client requests to insert or retrieve files. Also, optionally nodes can contribute storage to the system. Files that are stored in the Past system are associated with a 'quasi-unique' fileID that is generated during the insertion of file into system. Thus, files stored in the system are immutable files since a file cannot be inserted with the same fileID more than once. So, this means that update operation to the files is not possible. From my point of view, this is a drawback of the system. Files can be shared at the owner's discretion by distributing the fileID around and with a decryption key as necessary. Pastry, which is an efficient routing scheme, is used to ensure that client requests are reliably routed to the appropriate nodes. With high probability, the client's requests to retrieve a file are routed to a node that is close in the network to the client that issued the request. To retrieve a file in Past, a client must know the fileID that belongs to the file and decryption key as necessary. This means that there are no facilities in Past such as searching, directory lookup or key distribution. This can be categorized as another drawback of the system. That being said, authors note that Past is intended as an archival storage and content distribution utility and not as a general purpose file system. They have the assumption that users interact primarily with a conventional file system, which acts as a local cache for files stored in Past. There are three main operations in Past. Insert lookup and reclaim. Insert(name, owner-credentials, k, file) is the function prototype of insertion operation. This function stores the file at k diverse nodes within the Past network. Lookup(fileID) is the function prototype for retrieving a copy of the file indentified by fileID if it exists in Past system and if one of the k nodes that store the file is reachable over the internet. Reclaim( fileID, owner-credentials ) is the function prototype for reclaiming the storage occupied by the k copies of the file identified by fileID. Unlike a delete operation, reclaim does not guarantee that the file is no longer available in the system after it was reclaimed. This design decision is made in order to make the life easier for the system designer. There are two important tricks in the storage management of Past which are used to achieve a high global storage utilization. These are replica diversion and file diversion. Past allows a node that is not one of the k numerical closest nodes to the fileID to alternatively store the file, if it is in the leaf set of one of those k nodes. This is what they call replica diversion. If a nodes entire leaf set is reaching capacity, then file is diverted to a different part of the nodeID space by choosing a different salt in the generation of its fileID. These ideas are cool and simple enough to implement. In the experimental results the paper shows that system minimizes the fetch distance, balances the query load for popular files and displays graceful degradation of performance as the global storage utilization increases beyond 95%. SHARP --------- Problem Addressed: In this paper, the authors present Sharp which is a secure framework for secure distributed resource management in a large-scale computing infrastructure. The meaty part of Sharp is the structure to represent cryptographically protected resource claims(i.e. rights to control resources for certain time intervals) and also to subdivide and delegate these claims across a network of resource managers. By this way, resource peering is enabled, meaning that sites may trade their resources with peering partners or contribute them to a federation according to local policies. Approach Taken & Comments: Main approach in the designing of the system is the separation of claims into so called tickets and leases. This separation allows coordinated resource management across the whole system and at the same time it preserves site autonomy and local control over resources. Moreover, mechanisms for controlled and accountable oversubscription of claims are presented. These are the fundamental elements for dependable and efficient resource management. Basically in this approach, there is a decentralized structure in which multiple resource managers control different and possibly overlapping portions of the global resource pool. Above mentioned claims are soft-state timed claims and they expire after a specified period. The idea behind this application is that to give the system the opportunity to recover the resources if a claim holder fails. Also, resource managers may oversubscribe resources to improve resource efficiency and availability when claims are lost or left idle. Claim holders have a probabilistic guarantee that their claims will be granted to them. Sites can detect the oversubscription of claims and may reject these claims in order for the delegates not to have more than their allotted share of resources. Thus, claim delegation is so called accountable to protect against compromised resource managers. Claims in the Sharp are cryptographically signed. By this way, it is provided that the claims are not forgeable and they are also non repudiable. There is no assumption of a trusted certification authority. This means that each site is free to act as its own authority to certify keys and to grant or validate claims on its local resources. "Each claim is authorized by a chain of signed delegations anchored in the site authority itself. Sharp claims are self-certifying: an agent endorses the keys of its peering partners after validating them using any locally preferred authentication mechanism. In essence, each site is the root of a certificate mesh that parallels the resource peering relationships, certifying transitive trust for all principals that claim its resources." -- Ercan Ucan - eucan2@uiuc.edu Graduate Student Computer Science Department University of Illinois at Urbana-Champaign ------------------------------------------------------------ From: Hussam Hasan Abu-Libdeh Sent: Monday, February 13, 2006 11:48 PM To: Indranil Gupta Subject: 598ig review 02/14 The two papers that i'd like to review are: 1. SHARP: An Architecture for Secure Resource Peering. 2. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. SHARP (An Architecture for Secure Resource Peering): Personally, i thought that this paper was fun to read. Mainly because of this picture that it painted in my head (with its analogies to) a marketplace where people are exchanging goods. So, the whole idea behind this paper is that you want peers to be able to share their resources in a secure way with no one centralized server while achieving high dependability and efficient resource management. So, the way that this works is by having some nodes work as middle-men between the machines that would like to run a task, and the machines that are willing to share some resources for that task. The whole system is based on trust and mutual benefit for the nodes. Meaning that the nodes share their resources with hopes that other machines will let them use their resources in exchange (at least that is how i understood it). Thus far, this system sounds pretty much like other systems (except for some small changes), so what is the big deal ? Well, what i found interesting is that the middle-men (agents) aren't just middle-men used for facilitation, but they can actually act as brokers "buying" and "selling" resource shares/slices. Interested machines contact some agents and get a ticket that promises them some resource time which may or may not be fulfilled. These tickets have the characteristic that they are probabilistic (may not be fulfilled) and expiable (the ticket doesn't last for ever). Agents are punished/rewarded-with-more-trust based on their behavior and the percentage of fulfilled tickets that they issue. As far as security is concerned, the system was designed with security in mind! Each ticket/claim holds hashed crypt values that determine the issuer of that ticket with no doubt. Also, authorative servers could also be utilized. The paper outlines the security architecture assumed for SHARP as well as possible threats and vulnerabilities. Finally, the paper does a case study of SHARP on PlanetLAB and analyzes the outcome. Overall, this paper was enjoyable to read, and i have to admit, i really liked the analogies that they drew. --------------- Storage Management and Caching in PAST: This paper gave a brief overview of the PAST system then discussed the proposed Storage Management system. The goal of this storage system is to allow high global storage utilization and graceful degradation as the system approaches maximum utilization. Since the behavior of PAST peers is not determined before hand, we have a high chance of storage load imbalance (in which a few nodes store alot of files while other have hardly anything). The paper describes a "Replica diversion" mechanism to correct load imbalance, in this mechanism a full node A stores a pointer to the file to be inserted in its table while the actual file is stored in a another node that is not in its leaf set and that didn't do any replica diversion yet (other wise we'd have pointers pointing to pointers and that just kills the performance). The other proposed mechanism is "File diversion", this mechanism is used when replica diversion fails. The solution in this case is simple, have the requester rehash the file with a different salt (to create a different fileID) and retry the insert query again. The requester might have to try this operation multiple times until it generates a fileID whose node can actually store the file (either locally or using replica diversion).Nodes can also cache highly demanded files to enhance performance. Finally, a study of the results obtained from experiments were discussed and analyzed to show the actual performance of this algorithm/system. The results were quit astonishing, global utilization was at about 98% and only 5% of the nodes failed at a stage of 95% storage utilization (quite a good number). As far as my comments go: i thought that this paper was good, the approach to solving that problem was simple and effective. From: Maifi Khan [maifi.khan@gmail.com] Sent: Monday, February 13, 2006 9:12 PM To: Indranil Gupta Subject: 598ig review 02/09 Title: Storage management and caching in PAST, a large scale, persistent peer-to-peer storage utility. Summary: PAST is consists of nodes that are connected to internet and capable of initiating and routing of requests for insertion, deletion or looking up files. This is build on top of Pastry which provides efficient routing for p2p overlay network with high fault tolerance, reliability and scalability. Each file is assigned a 160 bit unique fileId which is generated using SHA-1 algorithm. PAST store one file on k PAST nodes whose nodeId are numerically closest to the 128 bit msb of the fileId. This ensures that replicas of the file will be evenly distributed in the networks geographically. The underlying Pastry handles dynamic node joining and leaving to the network. Each Past node has a public/private key pair which is associated with a smartcard for ensuring security. PAST uses replica diversion and file diversion techniques to ensure balanced storage uses in the network. File diversion is to use a different salt to generate a different fileId to store the file in a different part of the network. PAST nodes also use caching to improve the lookup time for files. Caching uses unused portion of the disk space and can be reclaimed any time needed. Experimental result shows that as network storage utilization increases cache hit rate decreases due to less cache space and using cache reduces the number of hops to find a file. Criticism: 1. Reclaim operation does not guarantee that the file is going be deleted from all the nodes. If someone wants to remove a sensitive content from the PAST system, one cannot do it with certainty. 2. Current PAST system often fail to store fairly large files. 3. They did not suggest what the value of 'k' should be and how to choose it and its impact on performance of the network. Future work: 1. As in experiment shows, for large files insert request may fail. Further work is needed to be done to see how to split a file and store it on different nodes. We also need to estimate the tradeoffs like impact on availability, reliability and file lookup time for such schemes. 2. For deletion, in some application it is required to provide guaranteed deletion. This problem also needs to be solved. Title: SHARP: An architecture for secure resource peering. Summary: SHARP architecture is proposed for distributed resource management, resource control and resource sharing among sites and different trust domains. Resources are assigned for use by others for a limited time as lease model. SHARP claims are signed using cryptographic keys for security reasons. Any site may join the framework by establishing a peering relationship with any member site. Service manager is responsible for obtaining resources. Agents are responsible for mediating between sites and resource consumers. The service manager first need to obtain a Ticket from an agent and then presents the ticket to appropriate site authority to redeem the ticket. The site authority than may honor by giving lease for any subset of the resources specified in the ticket. They used over subscription of tickets for better resource utilization and to deal with agent failures. They also propose under subscription to maintain a reserve of resources for fault tolerance. They describe SRRP protocol for secure request routing. They also suggested to preloading of vserver to hide the access latency. Experimental results show that as load increases, over subscription increases resource utilization. Criticism: 1. SHARP does not support revocation of keys or certificates which may have crucial impact on security. 2. SHARP is to some extent vulnerable to Sybil attack. 3. It is possible to launch a DoS attack and misuse the resources. Future work: 1. Further work is needed to determine a policy whether to use over subscription or under subscription in different situation. 2. Efficient way for revocation of certificates is needed. From: Raghu Kiran Ganti Sent: Monday, February 13, 2006 7:41 PM To: Indranil Gupta Subject: 598ig review 02/14 Indy, Please find the review for the 02/14 class below. Thanks Raghu Review for P2P apps (Feb. 14th 2006) ------------------------------------ Paper title: Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Summary: ------- The paper evaluates a storage management and caching scheme for large scale peer to peer networks. Files are inserted into the PAST network and are replicated over several nodes for availability. This allows for the data to be persistent and highly available across the Internet. Main ideas: ---------- The goal of PAST is to provide high availability and persistence for files stored in the Internet, and not provide searching facilities like Chord, Gnutella. Towards this, the main feature of PAST is to use Pastry as an underlying routing network and route the queries of insertion of files to a set of nodes, which can possibly store these files for availability. Nodes can contribute storage to the network, which is managed by PAST to store the inserted files. The main ideas behind load balancing is to do replica diversion and file diversion, as the names suggest, replica diversion diverts the replica storage to other nodes, and file diversion diverts the original file to a new node. Several policies for replica diversion, file diversion are presented in the paper. The cache management provides the PAST system to minimize the client access latencies. Strengths: --------- The main strength of the paper lies in the concept of the distributed storage in large scale P2P systems. The authors present a simple idea which is thoroughly evaluated. Weaknesses: ---------- Although the application is novel and quite interesting, in my opinion the main problem is- How many people are really willing to share hard disk space? As we know that Kazaa like p2p networks are 30% serve and 70% consume, it would be a similar situation when PAST is deployed! Another problem is when files are really large. It would be a bad idea to replicate a movie of size 1GB! Instead, RAID like methods which are quite tolerant to failures would be a good idea. Future work: ----------- It would be nice to extend the scheme so that anyone joining the PAST network will have to share resources in order to participate in file sharing. Also, security will be an important issue to be addressed. What if someone placed some malicious code on my hard-drive and the OS crashes, such situations are highly undesirable. **************************************************************************** Paper title: CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups Summary: ------- The paper studies the DNS performance and points out the component which causes the increase in delay in DNS server response times are the intermittent failures caused on the client-side. The authors propose a cooperative DNS (co-DNS) which sends requests to peers to do the DNS name lookup. Main ideas: ---------- The main idea of the paper is to do peer DNS, where the peers on the network help a machine in doing the DNS conversion when the DNS server is down or slow. The authors present several variants of the simple version, which are optimizations. For example, one can set the delay to the local nameserver before sending out queries to the peer nodes. Strengths: --------- The strength of the paper lies in the evaluation. The evaluation is thorough and done well. Apart from that, the idea is not something that is highly novel. Weaknesses: ---------- The aggressive name lookup can cause the network bandwidth to be used in-efficiently, as it happens with any aggressive algorithm! The paper also does not do any adaptive tuning of the parameters, which could lead to better network bandwidth utilization. Future work: ----------- Implementing such a system on real nodes and studying the effects of such an implementation will be something interesting to do. From: Ravishankar Sathyam Sent: Monday, February 13, 2006 5:39 PM To: Indranil Gupta Subject: 598ig review 02/14 Ravi Sathyam Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. Summary: This paper essentially presents a p2p overlay network (PAST) that’s used for storing files. Each PAST node and past file has an associated field ID (which is essentially random), and files are inserted into nodes with closely associated fields. PAST uses Pastry as the underlying protocol in order to route client requests to the appropriate nodes (and Pastry takes O(n) steps to search for the correct node). PAST employs smart-cards (which are associated with public/private key pairs) as the fundamental mode of security. When it comes to storage management, PAST tries to balance free storage amongst the nodes and also to maintain file with certain IDs within nodes with neighboring ID’s. But the above can be contrasting goals. Hence, PAST employs Replica Diversion (storing the replica in the node’s (of the desired set) leaf, which need not be amongst the k closest to the file). The paper then goes on to state three policies in order to control replica diversion. The first policy involves acceptance of replicas by a node (based on the size of the file and also the free space in the node). The second policy involves selecting another node to divert the replica, and the third policy involves diverting file to another node space (when the node chosen amongst the k closest declines, and the node it chooses also declines): this is done by selecting a new fileID and transferring it to the respective node space. PAST also uses a caching scheme in order to minimize the effects of network latencies when fetching a file. Experiments were then made and insertion statistics were then observed to improve inversely proportionally to the threshold for primary replica stores. The insertion statistics also improved, as expected, as the leaf set size increased. Here it is interesting to note that stats improve drastically when leaf set size doubles since larger leaf sets increase the scope for local balancing. Essentially for all of the experiments, PAST achieved high storage utilization and low file insert rejections. PAST also showed the effectiveness of caching schemes. Comments: When nodes of neighboring IDs are already organized into clusters (node spaces), I do not see the reason why Kelips wasn’t used as the underlying protocol for lookup and routing. On the other hand, a utilization of above 95% and a file insert rejection of less than 5 % for most cases is a great result! I did not see implementation of file encoding strategies, observing their effect on file and replica diversions would have been interesting. CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups Summary: This paper introduces a DNS lookup service that achieves name resolution with lower latency and low overhead. In order to do so, it tries to understand client-side behavior. Essentially from PlanetLab tests, it is shown that there are a small set of lookups that contribute heavily to the total lookup time, and so if one addresses this small set of lookups, then one can dramatically reduce lookup times. However, we must first understand the various types of client-side failures that might occur. These include 1) Packet loss in a LAN environment –(since nameservers communicate using datagrams, even a single packet loss would require retransmission, and the resolver’s default timeout for retransmission is 5 sec), 2) Nameserver Overloading – Nameservers can be temporarily overloaded, a possible factor being the relatively small size of the UDP receive buffer on the nameserver resulting in dropped packets, 3) Resource Competition, and 4) Maintenance problems leading to service interruption. The main idea behind coDNS is to send lookups to peer nodes when the local name service is experiencing problems. We would mainly like to query peers who are nearby since this results in low round-trip latency. The paper states that if one can contact the local nameserver (as usual) first for a lookup, wait for a timeout and then issue x-1 simultaneous lookups using x-1 randomly selected peer nodes, then one can reduce the average response time by a factor of more than two. An important consideration here is bootstrapping, since if the local DNS service is slow, then resolving all the peer names will increase coDNS’s start time. Hence, coDNS starts immediately and begins resolving peer names in the background (the background resolver uses coDNS itself). Comments: I wonder if there can be any optimizations made to the simplicit lookup scheme presented here (issuing numerous simultaneous lookups using random nodes which also introduces additional network traffic) From: Praveen Jayachandran Sent: Monday, February 13, 2006 5:15 PM To: Indranil Gupta Subject: cs598ig review 02/13 CS598IG - Reviews, 02/13/2006 Praveen Jayachandran Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, A. Rowstron et al, SOSP 2001 ------------------------------------------------------------------------ ----- This paper describes a large scale P2P storage utility called PAST. It is self-organizing and built on an overlay network of nodes that store replicas of files, cache copies of popular files, and cooperatively route file queries. The PAST system is built on top of the Pastry routing scheme, and derives properties such as routing within log(N) steps and using log(N) messages, where N is the number of nodes in the system, from Pastry. The use of the SHA-1 hash function to obtain node and file identifiers, provides geographical diversity in the location of replicas, which helps reduce the average fetch distance. The authors propose replica diversion and file diversion, together with some management policies as methods to balance the load on the network. The system supports caching popular files, in order to reduce the fetch time. The authors also propose some ideas to support security in PAST. Their experiments show that PAST can reduce fetching delay and can balance the query load for popular files. As future research, the authors intend to support searching, directory lookup, and key distribution with PAST. They also wish to study the potential benefits and costs involved in storing fragments of a file at separate nodes. Comments: 1. Is it possible that for two nodes A and B, A has a replica diverted to B, and B has a replica diverted to A, and are unnecessarily incurring the overhead and delay caused by replica diversion? I feel this could happen if A is heavily loaded initially, but later B becomes heavily loaded and A is lightly loaded. The authors do not mention any method to reclaim diverted replicas. Note that longer loops can probably occur, and can be quite difficult to detect. 2. The authors claim that it is preferable to divert a large file than multiple small ones. Although this is true, looking at the metric used by a node to accept a diverted replica, a smaller file has a greater chance of getting accepted than a larger file. Hence, it might be more prudent to transfer most of the smaller files, than to struggle finding a node to transfer the large file. Solving this problem probably requires smarter policies for choosing the node to host a replica. 3. Most of the graphs show a sharp rise in the failure rate when the utilization goes above 85%. How is this graceful degradation, that is claimed as one of the objectives of PAST? Under heavy load, is it possible to decide to store lesser than k copies, and when the load reduces, resurrect all the dropped copies? CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups, KyoungSoo Park et al, OSDI 2004. ------------------------------------------------------------------------ -- The authors in this paper, assert using experimental studies that client-side DNS failures are frequent and widespread, and largely affect DNS performance. Packet loss, nameserver overloading, resource competition and maintenance issues are identified as some of the problems that cause intermittent (at times periodic) DNS server failures. They propose a lightweight, cooperative DNS lookup service called CoDNS. The main idea is to distribute DNS requests to neighboring DNS nameservers in order to achieve low-latency name resolution, and at the same time ensuring low overhead. Solutions to when to send remote DNS queries, how many peers to involve, which peers to select, and what gains to expect are investigated. Experiments conducted on PlanetLab suggest that CoDNS can reduce average lookup latency, and greatly reduces the lookup times of slow lookups. Comments: I feel that the importance of reducing the latency of a few slow DNS lookups has not been motivated. How critical is it to have a DNS lookup take more than 100ms? How critical is it to have 3% of DNS lookups take more than 100ms? The default timeout for retransmission is set to an extremely large value of 5s, which suggests that it cannot be very critical. So, what is the motivation to study this problem? I feel this has not been addressed adequately.