From: Christopher W Banek Sent: Thursday, October 07, 2004 11:39 AM To: Indranil Gupta Subject: 598ig summary 10/07 Summary of papers presented for 10/07 Christopher Banek M. Maekawa, et al. For this paper, when I tried to access it from the main web page, I had 404 errors and SQL errors from the ACM Portal site, so I have to go by what I remember when I read it before for the undergraduate distributed systems class. This paper is a key paper in early distributed systems, as it presents a very efficient way of providing mutual exclusion (big o of sqrt-n), although it is not without it's share of problems. When nodes repeatedly enter and leave the system, the quorum system must also be updated, making churn a possible problem. Also possible attackers who do not want to vote may lock the system in a deadlock sate. Deadlock detection needs to be used to ensure that due to network latency that two competing nodes wanting to enter the critical section do not get exactly the votes needed to lock each other out (aka, one is waiting for the other that is waiting for the other). All in all though, this algorithm is very powerful and fairly robust considering the circumstances (and problems with trying to achieve some sort of consensus type problem in a distributed network environment). Shi-Ding Lin, et al. This paper very much improves on the mutual exclusion protocols known today. Peer to Peer systems have a lot of problems dealing with mutual exclusion due to churn, and possible failures and attacks. Although I agree that most of their research is quite good and valid with statistical data to back it up, I find the following flaw with their methodology: they don't address possible attacking nodes. In a system for mutual exclusion, the system may starve given just a few adversary nodes in the system to hold it up for reaching consensus on anything. I think that for this reason, and I'm not sure that there are many ways around it. This is why that in large peer-to-peer systems, I think that the better way is to avoid the need for mutual exclusion in protocol and system design, and possibly just add support for the small amount of times where two clients might do something that would violate the needed mutual exclusion to correct whatever problems may arise. Naor and Wieder I'm sure that this paper has a lot of wonderful applications, but with my current level of understanding in graph theory, this paper really hasn't done much for me. The mathematics behind the proofs are incredibly complex and confusing, although probably less so if this were a field of interest or expertise on my part. I just mainly wonder how these protocols will work in a real network environment, given that computers and networks are much different than just weights on a directed graph. I think it would have been a good idea for them to present real world data that has been collected, although I do agree with their introduction onto what needs to be changed about voting and quorum systems in general. I think their conclusion is also somewhat lacking in telling the real world applications of their findings. From: Jin Liang [jinliang@cs.uiuc.edu] Sent: Thursday, October 07, 2004 11:06 AM To: Indranil Gupta Subject: cs598ig review 10/07 A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-peer Systems. This paper presents a quorum based distributed mutual exclusion protocol for dynamic peer to peer systems. The paper first gives a straightforward design of a quorum based mutual exclusion protocol, where a client sends request to enter critical section (CS) to all the replicas. Each replica grants access to the first request it has received. If a client gets permission from a majority of replicas, it becomes the winner. If not, it releases the permissions it has obtained, backoff for some time and retries the process. The paper shows this naive protocol has good safety property, given that the quorum size is large compared with the size of the replica set. However, the paper shows the performance of this protocol is poor, especially if the variance of latency and the request rate is high. This is because requests from different clients may arrive at different replicas in different order, and there is no cooperation between clients and replicas (i.e., clients just blindly retries uopn failure to obtain the lock). This paper then presents a sigma protocol, which improves the performance by installing a request queue at each replica. When a client infers (from the responses it received from the replicas) that nobody is winning, it can yield the permissions it has obtained. The replicas can then grant permission to the next client. Hopefully this would achieve more consistent view between the replicas, which means the system succeeds in selecting a winner to enter the CS. If a client infers that someone else is winning, it will wait instead of retry. Because the replicas already have its request, and will grant permission to it once other winners have finished the CS. To cope with various failures in a dynamic p2p system, the paper proposes two techniques. The first one is lease, which prevents deadlock when the client in the CS crashes. The second is informed backoff, where a replica informs a client how long to wait before a retry, based on its position in the request queue. This prevents client stuck when a replica loses its request queue. (In the basic sigma protocol, clients wait for replica notification before a retry. Here, replicas tell client to retry at a certain time. So clients won't wait for ever). A sqrt(N) algorithm for mutual exclusion in decentralized systems This paper presents an algorithm for distributed mutual exclusion that uses only sqrt(N) messages. The algorithm based on obtaining permissions from a quorum system. Assume the system has N nodes, each node i has a quorum set Si. And for all i and j, Si and Sj are non-disjoint. Therefore, if a node successfully obtains permissions from all members of its quorum set, no other node will do so before it releases its lock. Assume each node participates in D quorum sets, the paper shows that D should be equal to K, if all nodes are to equally share the responsibility for arbitration. As a result, it is shown that K should be equal to sqrt(N). This means the message complexity for a node to obtain exclusive access to the critical section is O(sqrt(N)). It needs to send request to all members in its quorum set, wait for their response, and release the lock if succeeded. The quorum system achieves mutual exclusion. To avoid deadlock, each request is tagged with a sequence number. The sequence number is used to determine the priority of different requests. If a member in Si receives a high priority request than the locking request, it will inquire node i of its status. If node i will not be able to succeed, it will release its permissions. This avoids cyclic wait, which means deadlock is not possible. The paper also discusses the creation of the quorum system, the message complexity, and node failures. However, it assumes node failures are detected by some other mechanism, and it assumes the dynamic information is replicated, which means an alternative node can simple take over the role of a crashed node. In contrast, the previous paper uses informed backoff to deal with loss of memory failures, thus does not need the replication of dynamic information. Scalable and Dynamic Quorum Systems This paper examines the probe complexity of quorum systems and their implementation. It introduces the basic ideas of quorum systems, the quality measures for quorum systems such as load and availability, and discusses the probe complexity, i.e., the time and message complexity needed to find a live quorum. The paper presents a tradeoff between the load of a quorum system and the probe complexity of the system. The algorithm is optimal in the sense that the number of probes required is linear in the minimum of load and the smallest quorum set size. The paper also talks about their path quorum system and analyzes dynamic paths in such a system. The also provide a construction for a dynamic and scalable quorum system which could be viewed as a dynamic adaptation of the path quorum system. The paper should have some concrete examples how this dynamic path system can be used by applications. From: Mayssam Sayyadian Sent: Thursday, October 07, 2004 11:00 AM To: Indranil Gupta Subject: cs598ig 10/7 review Mayssam Sayyadian NetId: sayyadia CS598ig Fall 04 - 10/7 reviews : Algorithms for Systems A sqrt-N algorithm for mutual exclusion in decentralized systems, M. Maekawa, ACM TOCS, Apr. 1985. This paper describes an algorithm for providing mutual exclusion in a network of computer systems. The algorithm is a non-token based one and is based on only communication between nodes. The total number of messages is to the order of sqrt(N). The algorithm assumes an underlying communication mechanism which preserves order of messages and is totally timestamped. In this algorithm whenever a node wants to enter a critical section it sends a request message for a lock to all members of its request set. Each node upon receiving a message sends one response at a time meaning it maintains a queue of requesters and responses to the first of them. The requester can enter the critical section only if it receives all the responses from the nodes in its request set. When the requester finishes with the critical section, it sends a release message to all the nodes in its request set. The node that receives the release message should unlock itself. If there are requests in its queue, then it removes the request with the smallest timestamp from the queue, locks itself for this new request, and sends the response to the requester. So the algorithm is to the order of 3 times the size of request set. It's proved in the paper that request set could be build best to the size of sqrt N and it sketches an algorithm for building it as well: The algorithm for building the request sets are at the heart of this algorithm which is as follows: if Si is the request set for Pi, the request sets has to satisfy Si . Sj = empty , for all i, j and Si, for all i, always contains Pi. Also it's desirable that |Si| would be equal to |Sj| = K, for all i, j, and for some K meaning that the request sets are of equal size, and each is of size K. Also O(Pi) = O(Pj) = D, for all i and j where O(Pi) denotes the number of occurrences of Pi in all request sets. This means each node is involved in D request sets. As for the N nodes each has a request set of size K, so total NK nodes are required and since there are N nodes, each site need to be duplicated K times so K would be equal to D. The request size id K and to get an idea about K we could consider the first request set, it has K nodes, each of them can be in K-1 other request sets. Each other request set should contain at least one of the nodes in the first request set. So Totally K(K-1) extra request sets other than the first one plus the first one means: N = K(K-1)+1 which conveys K = (almost) sqrt(N). To generate the request set one can assume that N = K (K-1) + 1, for some K, and K-1 is a prime number, so considering a matrix of size K-1 by K-1, K groups of K-1 nonintersecting sets could be generated. K-1 nonintersecting rows, K-1 nonintersecting columns, and (K-2) of (K-1) nonintersecting diagonals as well. For each number (out of the first K numbers) could be combined with each of the K-1 nonintersecting sets to produce K-1 of 1-element-intersected . sets. A problem with this algorithm is the possibility of deadlocks but they can be detected. After getting to know the deadlock it is resolved in a way that the node tries to ask the requester with the larger timestamp to give up the privilege and this resolves the problem. Also about this algorithm it's worth to mention that although although less messages need to be sent in this algorithm, the response time is still not much improved since a time delay of T for sending request messages and delay of T for sending response messages exists. So, the delay to get a response for entering the critical section is 2T. Finally the paper discusses the algorithm in two different settings: under heavy load and under light load. A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems, S-D. Lin et al, IPTPS 2004. This paper is about mutual exclusion in a p2p network of peers. It proposes the Sigma protocol that is implemented inside a dynamic P2P DHT and deals with challenges of p2p systems. The high level idea here is to adopt queuing and cooperation between clients and replicas so as to enforce quorum consensus scheme. The paper contributes in the sense that it provides the algorithm, demonstrates the scalability of the algorithm with the size of the network, shows its robustness to contention, and declares how it is resilient to network latency variance and its fault-toleranace. The key idea of sigma algorithm is to use logical replicas and quorum consensus to deal with the dynamism of the p2p system. Scalable and dynamic quorum systems, Naor and Keidar, PODC 2003 This paper is about the issues related to the complexity of quorum systems and difficulty of implementation of them in dynamic environemtns. In the first section the paper contributes by algorithmic investigation of the complexity of finding a quorum in case of random failures. The authors show that there is a tradeoff between the load of the system and its probe complexity for non adaptive algorithms. Two optimal algorithms are presented: the first is non adaptive and meets the lower bound, and the second is adaptive with a probe complexity that is linear in the minimum between the size of the smallest quorum set and the load of the system. The other contribution is presenting Dynamic Paths as a suggestion for a dynamic and scalable quorum system, for a dynamic environemtn like p2p systems in which elements join and leave the system. The quorum system could be viewed as a dynamic adaptation of the paths system, and therefore has low load high availability and good probe complexity. Finally it is shown that it scales well as the number of elements grows. From: Vartika Bhandari Sent: Thursday, October 07, 2004 10:56 AM To: Indranil Gupta Subject: 598ig review 10/07 A sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems This paper presents an algorithm for distributed mutual exlusion based on the notion of quorums or voting sets. Each process has an associated quorum and requires the affirmative votes of all members of the quorum to obtain the token. The quorum sets for processes satisfy the property of pairwise non-empty intersection. This guarantees safety. Liveness (protection from deadlock) is ensures by stipulating that a process that realizes it cannot get all the votes in its quorum relinquishes those it holds. The algorithm has O(sqrt(N)) message complexity. The algorithm may be visualized as arranging all N processes in a sqrt(N) X sqrt(N) grid, and composing quorums as union of a row and a column.Degeneracy may be introduced if N is not a perfect square. Scalable and Dynamic Quorum Systems This paper considers the issue of implementing quorums in a dynamic environments where nodes may join and leave at will. The key requirements are that load be low, that availability be high (i.e. even if some node(s) fail, a quorum be available amongst remaining nodes), probe complexity (i.e. messages and time needed to find a quorum) be low, that integrrity be maintained despite membership changes and that the system scale to a large population. The authors consider two classes of algorithms viz. non-adaptive and adaptive. Non-adaptive algorithms commit themselves to a sequence of probes without acquiring knowledge of failures. This allows for parallel implememntation, but would cause higher message overhead. The authors how a point-of-tradeoff between load and probe complexity in these algorithms. A quorum system called Paths is analyzed. The system is defined on a grid of points and quorums sets are degined as unions of top-donw and left-right paths. In this sense, the construction is similar to Maekawa's algorithm. Some results from percolation theory are used to show that with high probability a quorum is found after Theta( sqrt(N) log (N) ) probes. An adaptive algorithm is then considered wherein failed regions can be circumvented. A dynamic variant is proposed for changing membership. The system is visualized as a voronoi tessellation with the nodes as generators. The addition or removel of a generator point only causes re-adjustment of voronoi cells in its locality. Thus Join/Leave can be implemented in a scalable fashion. The paper is definitely extremely interesting in terms of their treatment of dynamic membership changes. However, it would be interesting to consider whether the model used in this paper maps well to quorum implementation in a wireless network where the broadcast wireless channel leads to a different kind of connectivity and communication properties. A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems This paper considers the problem of providing a mutual exclusion mechanism in replicated systems built atop P2P DHTs. The authors draw a parallel with ALOHA. Clients seeking access to the crticial section send requests to all replicas and wait for responses. A client which gets a majority vote wins the round (simialr to gaining access of the channel). However if some of the replicaes undergo failure and recovery, the safety property may be violated, as that repliac would lose memory of earlier voting. Thus 3k+1 replicas are needed to guard against k replica resets (this would follow from a byzantine agreement kind of argument). The probability of violation of safety, as well as throughput are plotted. Results are shown to be similar to what is seen in ALOHA. To improve the protocol, two mechanisms are incorporated. Since variable path latencies between clients and replicas can lead to recurrence of contention patterns, a notion of queueing and re-shuffling is enforced. Besides, to reduce contention, clients are made to go into an active waiting state. This is fairly reminiscent of p-persistent CSMA. This paper is interesting in terms of the used analogy between token-access and channel access. From: James Richard Newell Sent: Thursday, October 07, 2004 10:47 AM To: Indranil Gupta Subject: 598ig review 10/07 Algorithm for Systems: vN, Mutual Exclusion in P2P, and Quorum Systems James Newell 10/06/2004 The first paper, "A vN Algorithm for Mutual Exclusion in Decentralized Systems," by Mamoru Maekawa provides a new algorithm for achieving distributed mutual exclusion in a message-passing network using only cvN messages. Mutual exclusion in distributed systems is typically very difficult usually relying on a centralized system. Distributed systems are inefficient have many points of failure. The author proposes a new distributed technique that involves only locking a minimal subset of the nodes before entering the CS. The subsets are typically minimal, symmetric (or close to) and require at least one node to be shared between any given two subsets. This guarantees mutual exclusion (because you need a LOCK reply from every node in subset). The algorithm is slightly elaborate because it needs to prevent deadlock by ordering requests. The requests are given a priority by Lamport logical timestamps and Node GUID as tie-breaker. Nodes attempt to put older REQUESTS before new REQUESTS as much as possible, which also happens to eliminate starvation. A node will only unlock when it receives a RELEASE or RELINQUISH message from the locking CS requester. The subsets are created either using a (revised) finite projective plane algorithm or a grid point algorithm. The use of these subsets allows the message-passing performs to by some low multiple of vN, which is better than any current distributed algorithm. The authors also propose that failed nodes are resolved by taking over the role of the node with another working node. This paper makes a few assumptions that could complicate the system. For one thing everyone needs to be aware of everyone else (to construct the S set) and be assigned a GUID. It appears that there is a centralized approach to constructing the S set (it doesn't really talk about how nodes agree on what the sets are). Second it assumes no link failures and node failures are detectable. This can cause problems with lost requests or network partitions. Plus, we assumed all nodes are friendly (working correctly) and trusted. However, the idea seems very promising. There really is no efficient distributed mutual exclusion algorithm. The only really previously feasible method was centralized, but they voids peer-to-peer networks. This idea seemed to minimize message passing, which was the largest source of inefficiency of previous distributed algorithms. The next paper is titled "A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems" by Lin, Lian, Chen, and Zhang. This paper looks at mutual exclusion exclusively at peer-to-peer DHT networks. Their secondary goal is to provide increased performance compared to tradition "straw-man" protocols. For a given resource in the DHT, there are a few replicas distributed throughout the system. The "straw-man" protocol requires nodes to gain a lease from each replica of a resource before entering the CS. A lease is only provided is no one else currently else has the lease. Losers retry after a certain random back-off period. Mutual exclusion can only be broken when enough replicas reboot and forget who currently has the lease. The probability of this is considerable low for certain network parameters. Unfortunately, the performance of this system degrades to zero after a saturation point. The authors propose a revised version, called Sigma, which alleviates this problem by including request queues and enforce active waiting on requesting nodes. When there is no obvious winner (no consensus among replicas) a YIELD response forces the replicas to shuffle their queues around until a consensus is reached. Replicas also employ an informed backoff to inform requestors and estimation of how long they should wait till requesting again. The authors then go on to show that their protocol doesn't degrade to zero after saturation, but instead levels off. This is even true when nodes crash and reset their memory. This is really difficult issue because not only are you dealing with distributed mutual exclusion (which is tricky enough), this paper tackles peer-to-peer DHT where peers are transient and global consistency is difficult. I was a little unclear on the YIELD queue shuffling was guaranteed to converge to a consistent state among the replicas. It seemed to me that they just take the next-in-line item, so it could continually run forever in a state of non-consensus. Regardless, this is a very practical application in P2P DHT because it could ensure write/update consistency by locking files or data section before performing the transaction. Hopefully, the future will bring a usable implementations with Chord or Kelips. The last paper is "Scalable and Dynamic Quorum Systems" by Naor and Wieder. This highly conceptual and mathematical paper investigates the probe complexity of quorum systems and their specific P2P implementations. A quorum is an intersecting family of sets over some universe. They are used often in distributed systems. For example, quorums that have a non-null intersection property can be used to ensure mutual exclusion. Using a paths algorithm, you can ensure that this property can hold. The sets are constructed by either an up-down(ish) direction or a left-right(ish) direction. This ensures that every vertical and horizontal paths will cross and hence will share a node. The Paths quorum system is shown to have effective both non-adaptive and adaptive algorithms. Therefore, it offers a balance between different quality measures. The authors propose their own version of Paths, called Dynamic Paths which allows for nodes to leave, join, and fail in the network. It is shown that DPATHs scales well, allows for a dynamic setting, and keeps the same properties of the original Paths algorithm. The authors go to provide a bunch or theorems and proofs that demonstrate quorum complexity and its corresponding bounds. This is a good paper in that it provides all the mathematical proof to backup claims about many of the mutual exclusion algorithms. It also contributes it own revisions and optimizations to these algorithms. Hopefully, future designers will consider its contributions when they build new dynamic distributed systems. From: Zahid Anwar Sent: Thursday, October 07, 2004 10:37 AM To: Indranil Gupta Subject: 598ig review 10/7 Zahid Anwar NetId: anwar CS598IG Review Sqrt N Algo for mutual Exclusion --------------------------------------- Summary ------- This paper observes that in order for a process to enter a critical section, it is not necessary for all of its peers to grant it access. Subsets of the peers are enough to grant permission as long as subsets used by any two processes overlap. This way processes do voting for one another to enter the critical section. A 'candidate' process must collect sufficient votes to enter. To obtain an entry a process sends a request to all other K-1 members of the voting set and waits for all the replies before entering. A process will reply immediately unless its state is either HELD or it has already replied otherwise it queues the request till it gets a release and replies to the request at the head of the queue. To leave the critical section the process sends a release message to all other K-1 members. Strengths --------- + Historic paper with major implications on implementation of distributed mutual exclusion + Achieves safety property. No two processes can enter the critical section at the same time + Bandwidth utilization is 2sqrt(N) msgs per entry and sqrt(N) per exit which is superior to Ricart and Agrawala's algo 2(N-1) Weaknesses ---------- - Algorithm is deadlock-prone. For voting sets {p1,p2}, {p2,p3}, {p3, p1} the processes can enter a situation where each process has received one out of two replies, and still none can proceed. - Synchronization delay worse than R & A's since it requires a round trip time - Doesn't tolerate loss of messages if channels are unreliable - Can tolerate a crashed process only if it is not in a required voting set Future work ----------- - Adapted by Saunders 87 so that it is dead lock free. (Queue requests in happened-before order) Review A practical distributed mutual exclusion protocol -------------------------------------------------------- Summary ------- This paper improves on the basic strawman mutual exclusion quorum consensus algorithm to adapt it for peer-to-peer systems. The particular aspect of p2p that they are optimizing for is that there is lot of variance in network latency between a client and replica which makes it difficult for replicas to build a consistent view of competing replicas. They introduce lamport time stamps to help clients determine their 'place in the race'. Yield messages reshuffles queues at replicas for stabilizing and helping to decide who the winning client is. Replicas use a technique called informed back off to rebuild state on failure. Strengths --------- + Good performance improvement over conventional Strawman algorithm according to the simulation + Good probabilistic failure properties Weaknesses ---------- - Achieved better performance but at the cost of greater memory usage and message overhead - Assumes nodes will not be malicious Future work ----------- -Replicas need not suffer from complete memory loss. Can we save the queues to permanent storage and reuse them if replica comes up in a matter of seconds? Review Scalable and Dynamic Quorum Systems -------------------------------------------------------- Summary ------- Quorums are a popular technique to solving mutual exclusion, data replication and access control in distributed systems. A problem inherent with Quorums is that as the number of processors increases the 'algorithmic probe complexity' may increase unmanageably. This is the actual time and message complexity needed to find a live quorum. It is determined by the network and by the quorum system. The paper shows that there is a tradeoff between the load and the nonadaptive probe complexity of quorum systems in particular it is at least log n divided by the load. They come up with a non-adaptive algorithm for finding a quorum in the Paths quorum system. The Paths system is the first system shown to have an excellent balance between many somewhat contradictory measures of quality. The only system that has less probe complexity is the Crumbling System which suffers from high load. The authors also present a dynamic adaptation of the Paths system and show that it has low load, high availability and good probe complexity properties. Strengths -------- + Provided a good study of the different measures of quality for Quorums like load, availability, probe complexity, integrity, scalability, how they are related to each other and to operations like joins and leaves + Establish a interesting relevance to percolation theory Weaknesses ---------- - In order for it to achieve a probe complexity characteristic of their algorithm will the participants of the distributed system always have to be arranged in the LxL grid formation like that of Paths - How valid is the assumption that the portion of time a live quorum exists is the same for dynamic as for the static model models? Would have been good if they had given some reason for assuming ergodicity on the edge configuration. Future Work ----------- - A evaluation of the probe complexity/load of a working DHT would strengthen their claim that Dynamic Paths is an 'excellent candidate for an implementation of quorums in a dynamic distributed network.' From: Boris Capitanu Sent: Thursday, October 07, 2004 10:35 AM To: Indranil Gupta Subject: cs598ig review 10/07 A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems --------------------------------------------------------------------------------- This paper presents Sigma, a protocol implemented inside a dynamic P2P DHT designed to bring the benefits of mutual exclusion to P2P applications by adopting queueing and cooperation between clients and replicas. Queueing is used to deal with the variance of network latency between a client and each replica. To deal with contention in retrying due to collision, the protocol uses an active waiting strategy. A client starts by sending REQUEST messages to all replicas which, upon receipt, can grant its vote immediately or queue the request depending whether it has already voted. By examining the "owner" attached to each RESPONSE message, a client can determine if: 1. it is the winner by quorum consensus (that means it gets permission to enter the critical section); 2. some else won, in which case the client does nothing knowing that it was registered on the replicas already; 3. nobody won, in which case the client sends out a YIELD message to all the acquired replicas. A YIELD message indicates to the replica that it should remove the client from the winner seat and place it into the queue - its purpose being to reshuffle the queue. Using a combination of informed backoff and lease allows the protocol to achieve reliable communication over an unreliable channel. Informed backoff is a mechanism used to rebuild a restarted replica's state without overloading the healthy replicas. The experimental results show that the protocol offers high performance in heterogeneous network conditions with varying contention rates. The use of logical replicas and quorum consensus schemes allow the protocol to deal with the inherent dynamism of P2P networks. Having quasi-consistency and cooperation between clients and replicas makes the protocol highly tolerant to large variance in network latency and high contention. Scalable and Dynamic Quorum Systems ----------------------------------- This paper looks at quorum systems and analyzes the complexity of finding a quorum in case of random failures. A second contribution of the paper involves "Dynamic Paths", a dynamic and scalable quorum system extending on the Paths system, with low load, high availability, and good probe complexity. The metrics used in measuring the quality of dynamic quorum systems include load (minimal load on the busiest element) and availability (measures how resilient the system is). Depending on the type of application of the quorum system the metrics might have different interpretations. Dealing with temporary faults in the system is done by bypassing the faults (i.e. finding a quorum set for which all processors are alive). Finding a live quorum induces the notion of algorithmic probe complexity, which is the actual time and message complexity induced by the network and quorum system. Issues of integrity (preserving the intersection property when nodes join and leave the system) and scalability (adding nodes - processors - to the system should decrease the load on each processor) are also considered. A tradeoff between load and non-adaptive probe complexity of quorum systems is presented. The importance of non-adaptive algorithms stems from their ability to be implemented in parallel. This way, incurring a higher message complexity might lead to a reduction in the total time complexity of the algorithm. The authors present a very thorough mathematical analysis of the algorithmic probe complexity and offer algorithms (with proofs of correctness) for finding quorums in the Paths system. The calculation of the running time and message complexity considers the topology and implementation of the network over which the quorum system is defined. The Dynamic Paths Quorum System expands on the Paths system by substituting the grid with a continuous unit square which is decomposed into cells with each processor associated with a cell. Using a similar technique as the one for building DHTs, the decomposition of the square into cells is done using Voronoi diagrams. A quorum set consists of the union of the elements associated with the vertices that form a left-right path and a top-bottom path in the Delaunay graph. The join operation consists of three steps: 1. choosing a location x in the unit square; 2. finding the processor whose cell contains x and learn the location of its neighbors; 3. calculate the boundaries of the new Voronoi cell and inform the neighbors which in turn update their tables. The leave operation is similar - a processor that wishes to leave the system informs its neighbors which in turn divide its cell among themselves. The resulting system maintains the good properties of the Paths system, yet it operates in a dynamic setting. Using techniques inspired from P2P systems, Dynamic Paths is able to achieve low load, high availability, and good probe complexity. A SQRT(N) Algorithm for Mutual Exclusion in Decentralized Systems ----------------------------------------------------------------- In this paper the author describes an algorithm for achieving mutual exclusion in computer networks. The strength of this algorithm lies with the number of message exchanges required to achieve this mutual exclusion ability: only c*sqrt(N) messages are needed - where N is the number of nodes in the system, and c is a constant between 3 and 5. The algorithm assumes an error-free transmission subsystem with varying transit times, but in-order delivery guarantee. The algorithm relies in finding N subsets S_i that have certain important properties that guarantee minimum number of message exchanges to achieve mutual exclusion. The choice of S_i also guarantees that deadlocks and starvation cannot occur. Using a series of REQUEST, LOCKED, FAILED, INQUIRE, RELINQUISH, and RELEASE messages, as necessary, nodes in a particular S_i are able to coordinate and achieve mutual exclusion. After the description of the algorithm the author gives an example of a 13 node network where mutual exclusion is achieved, followed by a formal proof of the correctness of the algorithm (based on the properties of the subsets S_i). The number of messages required to create mutual exclusion depends on the choice of S_i. In some cases, a finite projective plane does not exist, in which case the desire of having symmetric S_i with minimum subset size must be abandoned. This results in somewhat unbalanced performance for some nodes which must now send extra messages. The author presents 2 methods of coming up with the subsets S_i. In comparing this algorithm with related work the author shows that under light demand the proposed algorithm outperforms the competition regardless of whether the choice of S_i was optimal (a finite projective plane exists) or not; under heavy demand, the author shows that the worst case scenario requires 5*(K-1) messages, but fails to give a direct comparison with the competition (as was the case for light demand). Node failure is also considered, the solution being that the failed node be removed from the system and another node assume its responsibility (through overtaking). From: Ellick M Chan Sent: Thursday, October 07, 2004 10:27 AM To: Indranil Gupta Subject: CS598ig Reviews 10/07/2004 Ellick Chan CS598ig Reviews 10/07/2004 Distributed quorum systems: There exist several types of quorum systems: Tree, grid, probabilistic, distributed hash/virtual and a centralized model. Distributed quorums rely on exploiting structural properties such as hierarchy, spatial locality, arbiters, and probability to ensure that only one node can obtain the critical section. A sqrt-N algorithm for mutual exclusion in decentralized systems: The square root of N algorithm for mutual exclusion algorithm allows a system to achieve ual exclusion in a system without having to contact every single node, like Ricarti and Agrawala. This is done by using a carefully chosen set of arbiters in a group of subsets with non-null intersections. The optimal choice for the subset size is N=K(K-1)+1, this satisfies the set properties and roughly translates to k=sqrt(N). The choice of K dictates the operation of the system, a K that is a power of a prime -1 yields the best results. Standard mutual exclusion problems such as deadlock and starvation are addressed by prioritizing operations and establishing a wait queue. The performance of the algorithm is 3(k-1) with light demand, and 4(k-1) with heavy demand. The k-1 extra messages results from the extra relinquish messages generated when the CS is freed. This work is closest to the grid quorum, where the grid size is nxn and the operational complexity is proportional to the set size, sqrt(n). This is because our arbiter sets are chosen optimally to be of a size that satisfies the non-null intersection set property. This work is important because it formalizes and proposes a solution for a set of conditions that guarantee an optimal arbiter set selection for the mutual exclusion problem. New work includes tree and probabilistic quorums, which yield tighter bounds. A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems: Distributed mutual exclusion algorithms typically rely on most node to be relatively stable, i.e. node joins and exits are uncommon. P2P systems challenge traditional distributed exclusion because of a variant number of nodes to contact (Ricarti/Agrawala and Raymd require knowledge of N), and high levels of churn. The authors of this paper propose the use of a churn-resistant DHT. However, this does not solve the problem of the knowledge of N. To solve the problem of counting N and scalability, they propose the use of virtual nodes containing logical replicas. The nodes in the DHT form logical space without holes. This works the same way as Chord; when a node is not present, the successor node is responsible for storing the keys. This leads to scalability because each node only needs hold log(N) virtual nodes, and upon an entry/exit, only log^2(n) entries need to be exchanged. Upon a quorum request, consistent hashing is used to find the node currently containing the replicas responsible for protecting the CS. These may lie on real or virtual nodes. This approach works fairly well in P2P systems, but it may suffer from several problems. First, Chord membership changes introduce transient holes as keys are transferred during entry/exit. Secondly, Chord itself suffers from the possibility of ring partitioning; this requires repair algorithms. Finally, when a node holding the CS crashes, the system may deadlock. Some of these problems are solved by the use of a leasing system, which times out on the CS if a node does not relinquish control. They further optimize this technique by using priority-based backoff methods. To address adversarial issues, the algorithm is able of tolerating at most k resets, where k is determined by n=3k+1. This result is similar to the minimal number of nodes in Byzantine Generals to reach consensus. Robustness of the system can be guaranteed as long as the proportion of maligned nodes is within a threshold. Scalable and dynamic quorum systems: The scalable and dynamic quorum system attempts to solve the problem of P2P churn in a distributed quorum network. They address the issue through two primary methods. The first method is the use of a grid, which deterministically places a node in a quorum group based on grid location. In the second half of the paper, they discuss the extension of this grid into a continuous space, and the use of Voronoi diagrams to effectively partition the space into “grid cells” that allow an infinite number of nodes to join without adversely affecting scalability. With use of the system comes a tradeoff of algorithmic complexity to the load of individual nodes. When less data is stored at nodes, probe complexity increases as a result. The load of the PATHS system is theta(1/sqrt(n)), in both the static and dynamic cases. PATHS allows users to choose a balanced set of requirements to result in an overall well-behaved system with respect to most loads. This review has referenced the below URL for supplementary information: [http://www.cs.utexas.edu/~lorenzo/corsi/cs395t/04S/notes/ quorumnotes.pdf] From: Thadpong Pongthawornkamol Sent: Thursday, October 07, 2004 10:24 AM To: Indranil Gupta Cc: Thadpong Pongthawornkamol Subject: 598ig review 10/07 Algorithms for Systems A sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems The paper proposed by Maekawa describes a classic algorithm to reach a consensus on mutual exclusion among member nodes in the network. Any nodes in the network can request for and get the permission to enter any critical region by sending a request message to its corresponding arbitrator set of nodes. If the requesting node receives permission message from every node in its arbitrator set, it can then proceed into the critical region. On the other hand, if it receives fail messages from some nodes in its arbitrator set, it can determine that another request has prevented it from entering the critical region, and it has to relinquish all previous permissions and wait until all nodes in the arbitrator set resend permission message to it again. The most interesting part of the algorithm is how to specify the corresponding arbitrator set for each node in the network so that the consensus on mutual exclusion will reach every node in the network. An important criterion to achieve such goal is to construct each node’s arbitrator set in a way that the non-null intersection property is maintained, so that any pairs of arbitrator set have at least one common node as a “gateway” to communicate a consensus among arbitrators. The paper then suggests the technique derived from N-point finite projective plane algorithm to construct such arbitrator sets. The algorithm constructs each node’s arbitrator set which contains sqrt(N) members. This results in sqrt(N) degree of messages sent into the network to acquire mutual exclusion, which is scalable. The concept of priority and relinquish message prevents any requesting node from deadlock and starvation. In conclusion, the paper gives a decentralized, scalable algorithm to achieve the consensus on mutual exclusion among member nodes in the network. The proof of the algorithm is very elegant and easy to understand. However, the paper does not mention the real algorithm to construct a node’s corresponding arbitrator set but leaves the reader to find the algorithm from N-point finite projective plane problem. Moreover, to achieve the proposed algorithm, a strict membership management is needed for the system in order to acquire all members to construct the arbitrator set. The paper also does not address the mechanism retain the stability of the system in the case of dynamic system where nodes dynamically join and leave. A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems The paper proposed by Lin, Lian, Chen, and Zhang presents another practical method to achieve a mutual exclusion in P2P systems. Zigma protocol, the protocol presented, interacts with only logical entity of P2P DHT and neglects the underlying physical structure of the network. The protocol is claimed to be able to handle dynamic failures. The SIGMA protocol described in the paper is simple and easy to implement. In order to achieve the permission to go into a critical section, the requesting node sends the request to all replicas and waits for their replies. There are three possible outcomes from such request. If the requesting node receives permission grants from more than half of the replicas, the requesting node wins and can proceed into the critical section. If the requesting node receives fail messages more enough to determine that the permission is granted to another node, it does nothing but wait. The last possible outcome is that it cannot decide which node receives the grant, which leads the requesting node to send yield messages and wait until the permission is granted. The concept of informed backoff can be used to construct service queue of any damaged replicas in order to maintain the system integrity under partial replica failures. The concept of lease prevents any granted node from holding the critical section forever. In conclusion, the paper presents a simple algorithm to achieve mutual exclusion that works well with dynamic behavior of peer-to-peer system. The simplicity of the algorithm incurs O(N) message overhead in DHT lookup process, which overall latency is hard to predict unless the underlying DHT implementation is known. Scalable and Dynamic Quorum Systems The paper by Naor, and Wieder presents an intensively algorithmic model to achieve a quorum system called Path. The paper first presents non-adaptive and adaptive algorithm to construct a quorum system that can tolerate a level of node failures. The paper then presents the dynamic version of Paths quorum system which exhibits fault-tolerance and churn-resistance, which are two important characteristics of Peer-to-Peer network. In traditional Paths system, the paper models the network as a two-dimension grid which each node is mapped to corresponding edge in the grid. The paper then maps the problem of finding a live quorum under some failures to finding a left-right path when some edges in the grid fail. The non-adaptive algorithm yield sqrt(N)*log(n) elements needed to be accessed to find a live quorum. However, the algorithm can be done in parallel. The total run time depends on the topology of underlying network. The paper then proposes an adaptive version of the algorithm which can achieve sqrt(N) probe complexity with high probability. The dynamic version of Paths quorum system is then introduced. Instead of mapping each node in the network to each edge in grid model, the dynamic algorithm generates Voronoi diagram on the network. The algorithm of generating Voronoi diagram is done locally at each node, only requiring each node to know the neighbor list. The Voronoi diagram can be locally recalculated when some nodes join or leave, yielding scalability and churn-resistance. A quorum set is then the union of the vertices that form a left-right path and top-bottom path in the corresponding Delaunay graph. The load of the system and the complexity of probing algorithm are acceptable with high probability. From: Dennis Y Chi Sent: Thursday, October 07, 2004 10:13 AM To: Indranil Gupta Subject: 598ig review 10/07 Dennis Chi 10/07/04 Algorithms for Systems A sqrtN Algorithm for Mutual Exclusion in Decentralized Systems This paper proposes a distributed algorithm for mutual exclusion that uses only O(sqrtn) messages. Each node must get permission to obtain mutual exclusion from a subset of the members, or a voting set. Each pair of voting sets has a non-null intersection, so there will be one common node that serves as an arbitrator. If two ore more nodes request mutual exclusion at the same time, then circular waiting or deadlock can occur. Therefore, requests must be uniquely identified and ordered by a timestamp and the requesting node’s id. Arbitrators can receive a higher priority request than the one it already voted for because transit times may vary. Therefore, arbitrators will contact nodes to determine whether they should relinquish control to nodes with higher precedence (earlier requests). When a node wants to exit the critical section, it sends release messages to its voting set. Analysis shows that the optimal voting set size is sqrtn, so when demand is light, the total number of messages is 3*sqrtn because the algorithm requires sqrtn messages for requests, replies, and releases. Under heavy demand the number of messages is bounded by 5*sqrtn. The paper proves that safety (mutual exclusion) and liveness (no deadlock or starvation) are ensured. Also, the paper shows that this algorithm will perform better than others when nodes fail. This mutual exclusion algorithm improves upon previous ones because there is no centralized controller, and requires the least number of messages. Also, the algorithm can handle variations in message delays by avoiding circular waiting. But one of the problems with this algorithm is that it assumes an error-free FIFO channel. If one request messages are lost, then the system will exhibit starvation. Although the paper the transfer of static and dynamic information in the presence of node failures, it also needs to make sure that safety and liveness are still guaranteed. A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems This paper presents Sigma, a distributed mutual exclusion protocol that is specifically designed for open environments with large numbers of unpredictable peers. The authors first analyze the strawman protocol, showing that it can provide realistic safety guarantees, but performance degrades as latency variance increases because client requests reach replicas at different times. The Sigma protocol addresses these problems. A client sends a request to all replicas, and each replica sends back a response with the current owner, and queues the request if the client is not the owner. If the client wins by quorum consensus, then it can entire the CS; if some other client wins then it does nothing, because it is in the replicas’ queue and will eventually be notified; otherwise, nobody won and the client sends a yield to all replicas that it owns. For a yield, replicas replace the client as the owner with the head of the queue, and add the client to the end of the queue. This allows replicas to achieve a more consistent state despite latency variance and high contention. Also, leases are used to ensure liveness when clients in the CS fail, and clients periodically resend requests so that replicas can rebuild state if they fail and reset. Simulation results show that the Sigma protocol can achieve ideal throughput, regardless of latency variance and lifetime of replicas. The Sigma protocol is interesting because it shows that safety cannot be guaranteed in a system with peer failures. Although the use of yields may help the replicas achieve a more consistent view, it seems like convergence and stabilization may be an issue when many replicas continually leave and join. Also, Sigma has security issues because it assumes that clients are not malicious and every requesting client knows who the owner of the replica is. Although the results show that Sigma performs well under latency variance, future work could be done in developing smarter hashing algorithms that take network variance into account when assigning resources to peers. Also, further testing can be done to see how the protocol performs for longer client accesses, because 10 seconds is not much time to modify a resource. Changes to the protocol can also be made to improve the security, so that the current owner of the replica is not sent to each requesting client. Scalable and Dynamic Quorum Systems This paper presents Dynamic Paths, a scalable quorum system that performs very well in dynamic environments. Dynamic quorum systems are more difficult to design than non-adaptive algorithms because they must not only deal with node failures, but also nodes joining and leaving the system. The performance of these types of quorum systems is based on the following metrics: load of each node, availability of the quorum, probe complexity for finding live quorums, integrity of the system (such that the non-null intersection property of quorums holds), and the scalability of the system (such that as the size of the system increases, the load on each node and the probe complexity remains low or decreases). The paper first proves that probe complexity is at least logn divided by the load for non-adaptive algorithms, highlighting the tradeoff between the two metrics. Intuitively, if the load decreases, then there are more nodes per quorum, so more probe complexity is needed to locate them. A non-adaptive algorithm applied to the Paths Quorum System supports that bound on probe complexity, while an adaptive algorithm contradicts the relationship by providing a lower probe complexity without exhibiting a high load. Since the adaptive version of Paths performs well, the authors simply adapt the system to a dynamic environment. The resulting system, Dynamic Paths, is ideal for dynamic networks because it exhibits low load and probe complexity, and high availability and scalability. This paper shows the difficulty and issues in developing a scalable quorum system for a dynamic environment. The Sigma protocol in the first paper was a little simpler because the client simply has to hash the name of the resource to find the clients, or quorum, holding the n replicas. From: Jintae Kim Sent: Thursday, October 07, 2004 9:45 AM To: Indranil Gupta Subject: 598ig review 10/07 CS598IG Review 10/7 Jintae Kim (kim28@uiuc.edu) A sqrt-N algorithm for mutual exclusion in decentralized systems, M. Maekawa, ACM TOCS, Apr. 1985. The motivation of this paper comes from an observation that a process only needs permission from subsets of their peers, in order to enter a critical section. A voting algorithm is proposed, in which a process should gain enough votes to enter. Since there is at least one common member of any two voting sets, it is ensured that at most one process can enter the critical section. The author shows that the optimal solution of the size of each voting set is sqrt(N) while achieving the mutual exclusion. This paper was written almost 20 years ago. At the time of publication, it was quite an improvement in terms of the number of messages required. While the previous algorithms needed O(n) messages, this algorithm requires O(sqrt(n)) messages. Another strength of this paper is that it is robust for some process failures, as each process does not need all nodes’ votes to enter the critical section. However, although this paper claims to be deadlock-free, I think this algorithm is deadlock-prone. If some processes concurrently request entry to the critical section, it is possible for each process to wait for the vote. Also, it assumes that all the messages are delivered reliably, which is not the case in real world. A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems, S-D. Lin et al, IPTPS 2004. This paper proposes mutual exclusion protocols based on the voting method, under the condition that there are dynamic and logical memory replicas. In this paper, the authors present two algorithms. The first algorithm is called a strawman protocol, which behaves like ALOHA. Even this simple protocol accomplishes robust mutual exclusion with very high probability, but its performance is poor, essentially degrading to zero with increased request size. The second algorithm, called sigma protocol, based on quorum consensus, improves strawman protocol by solving two problems: variance of network latency between client and replica, and greedy behavior of the clients who keep on retrying when collision occurs. They install a queue at replicas and reshuffle them towards a consistent view in case of high contention to resolve the first problem. To solve the second problem, a strategy is adopted to enforce clients into a state of active waiting. Informed backoff and lease enables Sigma to communicate reliably even under failures or unreliable communication channels. Sigma protocol shows good performance and its ability to handle membership change. Scalable and dynamic quorum systems, Naor and Keidar, PODC 2003 A quorum system is an intersecting family of sets over some universe, essentially associated with processes. In this paper, the authors firstly present the algorithmic complexity of finding a quorum in case of random failures as well as the tradeoff between the probe complexity and load. They provide the analysis on probe complexity, such as lower bound for non- adaptive algorithms, tight upper bound for PATHS, and tight adaptive upper bound for PATHS. What is more relevant to peer- to-peer systems is their introduction to the dynamic paths quorum systems. In the dynamic quorum systems, which are based on the dynamic voronoi diagram, processors may join and leave the quorum system, so they try to keep the cost of join and leave low while maintaining the intersection property. While this paper shows that the PATHS system has optimal load and availability, and offers excellent balance, developing some reliable load balancing algorithm which maintains all voronoi cells in equal size seems to be a possible future work. Also, the analysis of this paper is mainly focuse on average cases in an asymptotic way, but it still remains a question to see how worst-case load or worst-case probe complexity will be, which I think is also a potential problem. From: Michael Treaster [treaster@cs.uiuc.edu] Sent: Thursday, October 07, 2004 9:39 AM To: Indranil Gupta Subject: 598ig review 10/07 Algorithms for Systems A Sqrt-N Algorithm for Mutual Exclusion in Decentralized Systems This paper describes a protocol for mutual exclusion in decentralized systems. The protocol is much more efficient than the other existing protocols at the time of the paper's publication, with this protocol requiring O(sqrt-N) messages as compared with previous works' O(N) approaches. This substantial improvement in protocol efficiency is the paper's primary research contribution. The protocol defines a clever arrangement of nodes where every node has at least one logical neighbor with each other node. When two nodes have a conflicting request, their shared neighbor node arbitrates the conflict. Therefore, each node need only inform its neighbor nodes of its requests, rather than broadcasting its requests to all nodes to put the request to a global vote. The protocol requires several simplifying and unrealistic assumptions to work properly. First, it is assumed that the only failure that will be observed is a situation where a node ceases to send messages and never recovers, and that the other nodes can detect this occurrence. The paper does not consider other types of failures, such as intermittent lapses in communication, failures in links between some nodes but not between others, malformation of messages, or failures that cannot be detected. Related to this shortcoming is the issue of security. The protocol has no protection against a malicious node seeking to bring down the system. An adversarial node could send malicious messages, ignore messages from other nodes, or make unnecessary resource requests in order to violate mutual exclusion or deadlock the system. However, it is important to note that this is the first iteration of the proposed protocol, and it was proposed at a time when security was not yet considered a matter of the utmost importance. The improvement in communication efficiency is significant contribution, and fault tolerance and security were reasonably left as future work. A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems This paper describes another mutual exclusion protocol for decentralized systems. The paper describes a scenario where a number of logical replicas (the other client nodes) exist for each critical resource. When a client node needs access to a resource, it informs each of the logical replicas. Each replica inserts each requesting node into local queue based on the order of the request. If any single node is first in the queue of some threshold number of replicas, that node is granted access to the resource. Otherwise, all queues are shuffled and the vote is taken again. If no shuffling is needed, then access to a resource approximates first-come, first serve. However, if shuffling is required, then the order of access to a resource is essentially arbitrary among the nodes in the queue at the time of shuffling. The primary contribution of this work is not clear. Mutual exclusion is achieved, but the cost in messages sent is much higher than in the sqrt-N paper. This work uses a different fault mode than the sqrt-N paper, where nodes fail and restart immediately instead of failing and remaining down forever. However, no attempt is made to explain why tolerance to this class of fault cannot be added to the sqrt-N protocol, and no other security or fault tolerance features are described to add value to this protocol. The paper presents some experimental results, but these results are drawn from an artificial testbed rather than from a live, active network, and results are compared against a protocol that is known to have a poor design. Scalable and Dynamic Quorum Systems This paper presents an extension to the sqrt-N work from 1985. More specifically, it describes methods for discovering and maintaining the shared neighbor sets (quorum sets) in the event of temporary node failures and in the event of nodes leaving or joining the system. The protocol is analyzed in the context of the load placed on each node (how many other nodes use it as a shared neighbor), the availability of the system in the event of a node failure, the algorithmic complexity of discovering the quorum sets, the preservation of the quorum sets when nodes enter or leave the system, and the scalability of the system with regard to load on individual nodes and the cost of enter and leave operations. As a result of these design considerations, the protocol does not always construct the quorum sets of minimum size that were presented in the 1985 paper. The protocol searches a grid for a connected path that crosses horizontally and another connected path that crosses vertically. In a dynamic system, the grid is replaced by a Voronoi diagram which is redrawn when nodes enter or leave the system. The grid represents the connectivity between nodes in the system, and a pair of intersecting paths represent a quorum set. The paper presents an analysis of the protocol with respect to the design parameters described above, and it also considers the possibility of some kinds of faults and attacks on the system. From: Pei-Hsi Chen Sent: Thursday, October 07, 2004 9:16 AM To: Indranil Gupta Subject: 598ig review 10/07 Pei-hsi Chen --A N^1/2 Algorithm for Mutual Exclusion in Decentralized Systems This paper presents a distributed algorithm that creates mutual exclusion using c*N^1/2 messages, where N is the number of nodes and c a constant between 3(under light demand) and 5(under heavy demand). To obtain mutual exclusion, node i needs to obtain a permission from each member of a subset Si of the nodes of the network. Si’s satisfy the pairwise nonnull intersection property. The optimal value of the size of Si is N^1/2. In order to prevent deadlock, nodes’ requests are given priority based on sequence number/node number pair. A node will yield to others if the priority of its request is lower than that of any other conflicting request. The proposed algorithm is optimal in terms of the number of messages used to create mutual exclusion among fully distributed algorithms, and also when node failure happens it requires fewer messages than the other two algorithms (Fully centralized and Richart and Agrawala’s algorithm) The assumptions for this algorithm are 1. The underlying communications network is error-free. 2. Message transit times may vary but messages between two nodes are delivered in the order sent. How to adapt the algorithm to tolerate link failure or message lost may need to be considered. --Scalable and Dynamic Quorum Systems This paper regards the algorithmic complexity of finding a quorum in case of random failures. It analyzes the algorithmic probe complexity of the Paths quorum system, and presents a non-adaptive algorithms and an adaptive algorithm with a probe complexity that is linear in the minimum between the size of the smallest quorum set and the load of the system. This paper also presents the Dynamic Paths Quorum System that operates in the dynamic model, where processors may join and leave. It has low load, high availability, good probe complexity, and it scales gracefully as the number of elements grows --A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems This paper addresses the mutual exclusion issues in open and dynamic P2P environment. First, through a straightforward strawman protocol, it is shown that the variance of network latency between one client and each replica is responsible for the large performance degradation, and also that it is hard for all replicas to build a consistent view of competing clients. Therefore, a new scheme called Sigma protocol is proposed to deal with these problems. It installs a queue at replicas and reshuffle them towards a consistent view in case of high contention, and adopts a strategy to enforce clients into a state of active waiting. Sigma also deals with failure by informed backoff and lease, making protocol fault-tolerant. From: Yookyung Jo Sent: Thursday, October 07, 2004 8:57 AM To: Indranil Gupta Subject: 598ig review 10/07 A sqrt-algorithm for mutual exclusion : Summary : Mutual exclusion algorithm that uses the number of messages of order sqrt N is presented. When a node checks only a subset of the whole nodes, to get into a critical section, an intersection of any two subsets of nodes should have non-zero number of nodes, to guarantee mutual exclusion. From this, given N (as a total number of nodes), K (=the minimum number of nodes in a subset) should satisfy N = K(K-1)+1. Thus, K is roughly sqrt N. The algorithm is, * a node wishing for a critical section sends "request" message to all nodes in its subset (request messages have total order, according to sequence number, and prioiry of node ID) * a node who received "request", locks itself if not locked already, and send "locked" message back to the initiator. If it is locked already for other node, then, if the request for the current lock has precedence over this "request", the node sends "FAILED" message back, if the "request" has precedence over the request of the current lock, then it sends "inquire" message to the intiator of the current lock. Any non-satisfied request is queued. * If a node, upon receiving "inquire", knows that it cannot go to critical section, then it sends "relinquish". If a node is done with critical section, it sends "release" message. * a node, upon receiving "release", or "relinquish" message unlocks itself. * a node, upon receiving "locked" message from all its subset members, can enter critical section. The algorithm thus needs order of K (== sqrt N) messages. a mathematical technique of finding a finite projective plan of N points can be used to find subsets for N, or, an easier solution of presenting nodes as lattice points of a square and taking a combination of a row and a column for a subset can be used. Comments : [C] : Unfair subset construction(contrary to the fair, symmetric way in the paper) might be useful to certain application scenario. Since the time ( roughly the # of messages) to get the lock is proportional to the size of its subset, For overall performance : we can assign smaller subsets to those nodes known to use the critical resource a lot, with sacrifice of assigning larger subsets to those rarely using the resource With some kind of pricing scheme : we can assign smaller subsets to those nodes who have paid for the service. [C] : The smaller the intersection of any two subsets is, the better the performance of the algorithm, but the system becomes very vulnerable to node failures. A practical distributed mutual exclusion protocol : Summary : A practical mutual exclusion protocol in P2P should ensure the performance and correctness under the situation of the large variance of network latency and high dynamism (frequent membership change and node failure). This protocol sets multiple replica(mutual exclusion server) for a resource. Multiple replica is resilient to dynamism compared to a single mutual exclusion server for a resource. Decoupling of replica and actual server gives simpler interface (replica failure and replacement is not visible to outside, except that the memory is lost). A client sends "request" message to all replica and wins the lock for critical section, if it received m out of n(# of replica) votes (m>n/2). Under dynamism, any replica can forget (meaning "replaced by a new node") that it already granted a vote to a client, and reissue the granting vote to other client. However, the probability that two clients receives locks for critical section is very low for the reasonable choice of m value and can be set arbitrarily low by increasing m value. Variance in network latency causes the system performs very poorly with high rate of request, because the variance causes the requests from different clients with some time gap to be received with much time overlap and thus resulting in none of the clients receiving majority votes. Queueing is introduced to fight over this problem. And "Yield" message can be issued from clients to reshuffle the queue of replicas and let them reach the consensus (majority vote to a single client) quickly. Informed backoff, renewable lease of CS are introduced to deal with anamoly caused by dynamism. Comments : [C] : When none of the clients get majority votes, the protocol says that the client can send "Yield" message to replicas, which will reshuffle the queue and result in quicker convergence to majority votes given to a single client. However, there is no logical (or intuitive) guarantee that this heuristic should work well. "Yield" messages also suffer from the variance of network latency, thus, replicas will receive Yield messages of many clients (overlapped in time). And likely, the reshuffled queues will still not be able to reach consensus. Even though the author says that "Yield" actually works well in the experiment, some justification (if not formal proof) might be desirable. Actually, setting up some kind of total ordering by giving priority to node ID or other values (just as in sqrt N mutual exclusion algorithm above) might work well. Scalable and dynamic quorum systems : Summary : Comments : From: William Conner [wconner@glsn33.ews.uiuc.edu] Sent: Thursday, October 07, 2004 7:41 AM To: Indranil Gupta Subject: 598ig review 10/07 A ROOT(N) ALGORITHM FOR MUTUAL EXCLUSION IN DECENTRALIZED SYSTEMS In this paper, an algorithm is proposed that uses c * root(N) messages to provide mutual exclusion. In this case, c is a constant between 3 and 5, and N is the number of nodes in the distributed system. A previous unanimous decision algorithm took 2(N-1) messages. Another voting-based algorithm that was well-known at the time of this paper's publication took N/2 messages. The algorithm presented in this paper essentially relies on a node requesting and receiving permission from a particular subset S of the N nodes in order for that node to proceed into its critical section. The choice of S for each node is important since the distributed system must have the nonnull intersection property. The nonnull intersection property states that any two nodes that request permission to enter critical sections from each member of their respective subsets must have at least one common node in those two subsets. This common node serves as an arbitrator to resolve conflicting mutual exclusion requests. With this protocol, when a node wants to enter its critical section, it sends a REQUEST to every member in its subset S. It then waits to receive a LOCKED message from all of the nodes in subset S. It is possible that along the way, a request will fail due to a mutual exclusion conflict at one of the nodes in S. In this case, a node might have to RELINQUISH its lock if another node has sent it an INQUIRE message. After exiting the critical section, a node will send a RELEASE message to each member of its subset S. The approach for handling node failure presented in the paper might not be adequate for modern distributed systems, especially in p2p systems. Having another node take over a failed node's role could lead to a serious problem in p2p systems with nodes always joining and leaving. Applying their approach to a p2p system where nodes are constantly being removed would result in many nodes (that stay connected for long periods) always taking the role of nodes that have just left the system. In fact, nodes could join and leave several times causing even more work to be shifted to nodes that stay connected to the system for a long time. In this case, it would probably be better to reorganize the subsets occasionally rather than overwhelming nodes that stay connected to the system for long periods of time. A PRACTICAL DISTRIBUTED MUTUAL EXCLUSION PROTOCOL IN DYNAMIC PEER-TO-PEER SYSTEMS This paper presents Sigma, which is a mutual exclusion protocol implemented inside a dynamic p2p distributed hash table. According to the authors, Sigma is designed to be simple, efficient, and robust. Before presenting Sigma, the authors present a straightforward strawman protocol. This protocol consists of all clients wanting to enter their critical section sending requests to each of the replicas and waiting for responses. The replicas grant leases if they are not currently owned by anyone, otherwise it rejects the request and informs the client of the current owner. A client has won if it obtains some majority m of the replicas. Losers will release acquired votes (if any), back off, and retry after a random period. With this strawman approach, the authors noticed that network latency variance between clients and replicas caused poor throughput. In Sigma, clients send requests to replicas, receive votes, and one of the following occurs: 1) client wins by quorum consensus and enters critical section, 2) someone else wins, or 3) nobody wins, so client yields acquired replicas. A yield is essentially a release followed immediately by a request. Yielding allows replicas to reshuffle their wait queues to find the rightful winner (this could possibly take several rounds). Unlike the strawman approach, the network latency in Sigma does not adversely impact throughput. The next step for this line of research would be to use this particular mutual exclusion protocol in an actual peer-to-peer application (such as a distributed file system or distributed database). SCALABLE AND DYNAMIC QUORUM SYSTEMS This paper presents Dynamic Paths, a quorum system that is fit for a scalable and dynamic environment where processors leave and join at will. This system's suitability for such dynamic environments makes it appropriate for the p2p model. According to the authors, Dynamic Paths provides low load, high availability, and good probe complexity. Dynamic Paths addresses how the nonnull intersection property must be preserved while the system adapts in the face of peers leaving and joining the system. This paper also explores the tradeoff between trying to achieve a low load while maintaining a good probe complexity. Dynamic Paths is actually an extension of the Paths system that makes it suitable for p2p systems. It takes a continuous unit square and decomposes it into cells. Each node is associated with a cell. As nodes join and leave, the decomposition changes via Voronoi Diagrams. A performance evaluation of an actual implementation of this system would improve the paper. Although an analysis of the proposed algorithm is presented, a simulation or experiment comparing it to distributed mutual exclusion algorithms currently used in p2p systems would demonstrate that the proposed solution is both practical and verify that it realizes all of benefits mentioned in the analysis. From: Jigar Harshad Doshi Sent: Thursday, October 07, 2004 4:23 AM To: Indranil Gupta Subject: cs598ig review 10/07 A Practical Mutual Exclusion protocol in Dynamic Peer to Peer Systems This paper presents a protocol for mutual exclusion in dynamic peer to peer systems which takes into account churn and other factors in order to present a practicable solution. The peer to peer system is divided into clients and replicas wherein clients require access to replicas in order to change some aspect of the file and thus require mutual exclusion. The problem is not straight forward because of the high probability of failure in p2p systems. Thus the problem is characterized by a server which is available but may suffer from memory loss at random points of time. A ALOHA based protocol is first presented and shown to be inefficient. All clients that want to enter the CS send requests to each of the replicas and wait for responses. A replica grants a lease if it is not owned by anyone. Thus anyone who has more than n/2 favorable responses wins this round and can enter the CS> Losers back off for random amounts of time and try again. The protocol is shown to have 15-18 nines reliability. However the performance of the protocol is very bad and does not degrade gracefully. In fact the success rate becomes zero with increased contention. This is mainly because as nodes fail they try again and again and finally causes a flood. The authors surmise that this bad performance is due to the fact that the variance of network latency. Thus they build a new protocol called the sigma protocol. This protocol uses queing at each node when the lock operation fails. The timestamps are generated with lamports time stamps. If none of thecompeting clients win access to the critical section then the client sends out YIELD messages to each of the acquired replicas. This causes a reshuffling of the queues and this allows quick settling of the queues. One point ot note that unreliable communication media problems are treated as crashes of clients, Which may be inefficient though easy to implement. If a client crashes then replicas uses informed backoff to rebuild state. The simulation analyses show its advantages over the aloha based approach. Key Design Issues learnt: • Using Queues at client end to reduce contention • Achieving Mutual exclusion through simple voting Scalable and Dynamic Quorum Systems This paper presents approaches for designing scalable quorum systems. The paper shows that a tradeoff exists between load on a node and the size of the smallest quorum set. A quorum is a set of elements and a quorum system is a set system of quorums which satisfy a non null intersection property. Thus quorums are directly applicable to a lot of problems in distributed systems in general and p2p systems in particular like mutual exclusion. First a tradeoff is proposed for non adaptive algorithms which decide elements to probe without any knowledge of the network. They are easy to implement in parallel and hence popular. The paths system is then proposed which uses a grid based view of the network. Failure is modeled by a dual graph of the grid which intersects every edge in the grid. The algorithm tries to find a route from the left end to the right end which avoids all surviving edges of the dual graph (which denote failed links). The proposed algorithm is shown to have a complexity of ?(sqrt(n)). The authors claim that their algorithm is optimal although there exists a better algorithm called the crumbling walls algorithm. This is because crumbling walls does not perform well under high load. The algorithm is pretty simple and aims to travel horizontally moving vertically when a failed node is detected. Te worst case behavior is analused and found to be ?(n). Te authors then proceed to design a dynamic quorum system. They thus design a dynamic verson of the paths system . The system is now modeled as a Vornoi diagram, This handles nodes entering and leaving as simple recomputations of the Voronoi diagram. Using results form percolation theory the authors show that there exists a path from left to right with high probability. The systems is also shown to have high integrity. One of the key contributions of the paper is the design of quorum based systems which balance the different requirements. However as such no simulation results are shown so it remains ot be seen how the system behaves under high congestion and churn. A Sqrt(N) Algorithm for mutual exclusion in Decentralized systems This classical paper presents an algorithm for mutual exclusion in a distributed system. The algorithm has a better complexity metric than competing algorithms because it can achieve mutual exclusion with an asymptotic bound of sqrt(n) messages. This is much better than a lot of other approaches. The paper constructs such an algorithm by breaking the system into a number of sets. These sets are the quorum sets as seen in the previous paper. Each set intersects with all other set with atleast one element. Other properties are also proposed to keep the set to a minimum and achieve maximum efficiency. These include that all the sets are of the same size and each element belongs to a constant number of sets. An algorithm to find such sets is proposed and is found to be equivalent to finding a finite projective plane of order k. The set constructed is of size k = sqrt (n ) from which follow the complexity bounds. The algorithm proposed is quite simple. Each node contacts all nodes in its quorum set one by one. If the node is free then it grants access otherwise it places the request in a buffer. To prevent a deadlock mechanism the node currently holding the lock may be contacted ot know if it can RELINQUISH or RELEASE the request. Since each set intersects with every other set, it guarantees that any node getting a lock is guaranteed mutual exclusion. Proofs of the algorithms being mutually exclusive , deadlock free and starvation free are presented. The author then explains the steps involved in the creation of Si’s. As mentioned earlier this can be satisfied by a finite projection of k points. Since this may not always exist or easy to find a heuristic is supplied. Also methods to create unbalanced sets are proposed. The biggest contribution of this paper is a simple algorithm to provide mutual exclusion in a distributed system. The algorithm can be extended or modified to work in new areas such as sensor networks or p2p systems. From: Pradeep Kyasanur [kyasanur@crhc.uiuc.edu] Sent: Thursday, October 07, 2004 12:48 AM To: Indranil Gupta Subject: 598ig review 10/07 Pradeep Kyasanur 1. A sqrt-N algorithm for mutual exclusion in decentralized systems, M. Maekawa, ACM TOCS, Apr. 1985. Mutual inclusion is an important primitive for distributed systems. Truly distributed implementation of mutual inclusion requires each node to equally participate in the process of locking. This paper proves that mutual exclusion in decentralized systems will require at least O(sqrt(n)) messages. An optimal algorithm that achieves the lower bound is proposed and proved to be correct, i.e., the algorithm guarantees mutual exclusion, prevents deadlock and avoids starvation. The paper also computes the cost of recovering from node failures, and shows that there is a tradeoff between cost of mutual exclusion and cost for recovering from node failures. Comments: + The proposed protocol is scalable to large networks as it requires sub-linear, O(sqrt(n)), messages. - A key assumption in this paper, as well as related work, is the availability of reliable in-order message delivery system. While this assumption may have been suitable for distributed systems connected by dedicated network, it is not applicable for wide area systems. As the next paper seems to suggest, in real networks only probabilistic guarantees can be provided. 2. A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems, S-D. Lin et al, IPTPS 2004. This paper proposes a mutual exclusion solution for peer-to-peer systems. In peer-to-peer networks, there is a high churn rate. As a consequence, a host required for quorum may leave the network and may be replaced by a newly joined host. However, the new host is unaware of the state of the old host, and may take actions which contradict prior decisions. Unless care is taken in protocol design, mutual exclusion properties cannot be assured. This paper argues for greater cooperation between the client requesting the lock, and the replicas voting to provide mutual exclusion. Further, to enable newly joined hosts to reconstruct past actions, "informed backoff" mechanism is proposed. Using this mechanism, clients periodically retry their requests, thus a newly joined node can build its queue of previous pending requests. The protocol has been implemented and deployed on a testbed and experimental results indicate good performance. Comments: + The paper proposes the technique of increased cooperation between clients and replicas as a way of enhancing the performance of mutual exclusion protocol. This design technique may be applicable for other problems as well. Intuitively, this protocol is moving part of the system state from replicas to clients. - A flip side of moving state to the clients is the possibility of malicious clients. Good behavior of clients is needed to build the request queue of a newly joined client. I suspect there may be simple attacks possible which freezes the mutual exclusion protocol by explicitly populating newly joined replicas with incorrect state. 3. Scalable and dynamic quorum systems, Naor and Keidar, PODC 2003 This paper studies the complexity of building quorum systems in dynamic environments. Quorum systems are a fundamental requirement for many other primitives such as mutual exclusion. The distributed system is studied under a model where churn rate is high and node failures is possible. One problem is to detect when a quorum set is unavailable due to some node failures. It is shown that the complexity of probing for the availability of a quorum set with an adaptive algorithm is at least O(sqrt(n)). An "Dynamic Paths" system that meets this lower bound is presented. The system is also shown to have low load and high availabilit. Comments: + The paper identifies many possible implementations of quorum systems and identifies the lower bounds on probe complexity. From: Charles Yang [cmyang2@gmail.com] Sent: Wednesday, October 06, 2004 10:28 PM To: Indranil Gupta Subject: 598ig review 10/07 Charles Yang (cmyang2) 598ig, Fall 2004 10/07 Algorithms for Systems A Sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems. M. Maekawa. The author proposes a decentralized mutual exclusion algorithm to be used for a distributed system. However, it assumes an error free underlying communications network, and in-order delivery. It requires 3 x sqrt(n) msgs per mutual exclusion, sqrt(n) msgs each for requests, for obtaining permissions, and for releasing mutual exclusion. The algorithm requires that the network be partitioned into different sets, such that the nonnull intersection property is held. Basically, this makes sure that there is a tiebreaker in every set, and each set is connected to each other. Each node sends & receives the same # of msgs per mutual exclusion, and each node is given equal responsibility in determining mutual exclusion. The set size K is shown in the paper to be optimal at K=sqrt(N) where N is the # of nodes in the network. The algorithm depends on each request having a priority which is dependent on timestamp of the request, and the node #, and higher priority requests can preempt those of lower quality. This enables deadlock avoidance. Under light demand, it is shown that the algorithm requires 3 x sqrt(N) msgs per mutual exclusion, and under heavy loads: 4 x sqrt(N). With node failures, an addition sqrt(N) msgs are required to regain the lost information. A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems. S.D. Lin, Q. Lian, M. Chen, Z. Zhang. Mutual exclusion is one of the fundamental and well-studied primitives to be implemented on distributed systems. A distributed file system, for example, requires a concurrency control mechanism in order to function. The authors present the Sigma protocol; it is implemented inside a P2P DHT. The first introduce a strawman protocol, which basically requires a quorum in order to reach a consensus. Performance is sub-optimal as throughput increases linearly when contention is low, but reach a peak and then degrades essentially to zero. The Sigma protocol builds upon the strawman protocol by using a queue at replicas (those nodes participating in the consensus) and introducing the notion of active waiting. Basically, if a client (as well as others) requests a consensus and concludes that nobody won, then it can send a YIELD (RELEASE+REQUEST) msg which will allow the reshuffling of the queues in order to allow for a winner. However, this will not guarantee FCFS behavior. Experimental results indicate that the throughput increases linearly and then essentially stay near the peak rate. Nitpick: While I'm usually not a stickler for grammar (especially when English is a 2nd language) the frequent grammatical errors of this paper actually distracted and made the understanding of the paper more difficult. Scalable and Dynamic Quorum Systems. M. Naor, U. Wieder. The paper investigates the algorithmic complexity of finding a quorum, and proposes a quorum system. First off, the paper shows that there is a trade off between the load of a quorum system and the probe complexity for non-adaptive systems (choosing a quorum without any knowledge if certain elements failed or not). The paper then investigates the Paths Quorum System suggested by one of the author's previous papers. It introduces a non adaptive algorithm that matches sqrt(n) x log n non-adaptive probes, and can be implemented in parallel. They then introduce a more efficient adaptive algorithm with probe complexity on order of sqrt(n). They then introduce Dynamic Paths (Path + dynamic adaptivity)-- a scalable, low load, high availability, and low probe complexity system based on Path. They basically substitute their grid model of Path, and replace it with a continuous unit square. They break that square up into cells, and then partition the space out using Voroni diagrams. Then select a quorum by choosing those cells that lie along the x and y axis of any given pair set of lines.