From: Ravishankar Sathyam Sent: Thursday, March 16, 2006 9:18 AM To: Indranil Gupta Subject: 598ig review 3/16 Ravi Sathyam CS 598ig Service Capacity of Peer to Peer Networks Summary: The paper talks about the service capacity of P2P networks. According to the paper, two such models are presented for P2P networks, namely transient and steady-state. In the transient model, we start out with few servers containing the file. However, this number can increase rapidly since people who download the file also begin acting as servers, and we are interested in how fast this number increases. Here, a branching process is used to model the service capacity in transient model, and then the service capacity growth rate is then optimized. It is observed that subsequent bursts in demand lead to exponential growth in the throughput. In the steady-state model, service capacity of the system can track and scale with the offered loads. Now the average delay from the peer’s point of view is more important. Here, a Markov chain model is used to model the service capacity under the steady-state regime. It is seen that subsequent bursts in demand do not lead to an exponential growth unlike the transient model. Comments: In this paper for the transient model, we are automatically assuming that people who want to download will volunteer to upload their files like BitTorrent but this is not the case in many P2P networks today. In addition, will high churn rate and significant node (and/or path) failures negatively affect the service capacity growth rate in transient model? The capacity of wireless networks Summary: This paper analyzes the service capacity of mainly two types of wireless networks. One type of network considered has n randomly located nodes, each of which is capable of transmitting at a constant bitrate and transmitting to a common location. Here the capacity is Theta(W/Sqrt(nlogn)), independent of the way nodes are arranged. Such a network is known as a Random Network. Also, another type of network considered has n nodes arbitrarily located on a disk of unit area, and the range of each transmission is selected in order to maximize network capacity. Such a network is known as an Arbitrary Network. Under such constraints, even if the nodes are optimally placed on the disk, it is shown that the throughput cannot exceed Theta(W/Sqrt(n)). It is noted that even splitting the channel into several sub-channels does not change these results. Comments: It’s interesting to note that this paper presents rigid bounds for capacity for wireless networks. From: Bach Duy Bui Sent: Thursday, March 16, 2006 9:02 AM To: Indranil Gupta Subject: 598IG review 03/16 Service Capacity of Peer to Peer Networks 1. Summary This paper studies the service capacity of peer to peer file sharing applications. The service capacity is viewed in two regimes: the 'transient state' and the 'steady state'. In the transient regime, an interesting question is how quickly a P2P network starting with a limited number of servers can grow its service capacity to meet the load associated with a burst of demands. While the interesting question in the steady state is what is the average throughput/delay performance seen by a typical peer. This paper focuses on studying the performance characteristics of 2nd generation P2P applications considering performance as a function of time at the granularity of a single P2P transfer session. The transient service capacity is modeled by a branching process. The analysis of transient regime with deterministic model shows that: (i) the average delay seen by peers scales as log of n where n is the number of requests. (ii) the average delay is improved with a factor of m where m is the number dividing parts of a document. In addition to the deterministic model, the authors also considered the branching process model. For the steady state analysis, the authors proposed a Markov chain model. Trace measurements are used to confirm the proposed modeling analyses. 2. Comments It is interesting to see in this paper how analytical models are used to analyze the P2P system. In my opinion the branching model used in this paper does not accommodate to model a system where peers may download different parts of a document from different peers. A more general model is required for this purpose. The Capacity of Wireless Networks 1. Summary In this paper, a fundamental result on the capacity of wireless networks is introduced. Two models of successful reception of a transmission over one hop is used: (i) protocol model and, (ii) physical model. Two types of networks are considered: Arbitrary Networks (where the node locations, destinations of sources, and traffic demands, are all arbitrary) and Random Networks (where the nodes and their destinations are randomly chosen). Traffic capacities of Arbitrary Networks and Throughput capacities of Random Networks under different reception models are derived. The main results show that the throughputs decrease with the increase in the number of nodes. Splitting the channel into several sub-channels does not change any of these results. 2. Comments It would be good if the authors consider also the effect of channel access protocol overhead. It is shown in practical that this overhead can take as much as half of the available bandwidth depending on the in-used protocol i.e. collision versus collision-free. The effect of the channel access protocols can throttle the network capacity even more greatly in multi-hop settings where a node transmission blocks not only its neighborhood but also the vicinity of its neighbors to avoid the hidden node problem. From: mbakht@gmail.com on behalf of Mehedi Bakht Sent: Thursday, March 16, 2006 9:00 AM To: Indranil Gupta Subject: 598 IG Review: 16th March Title : 2 P2P or Not to P2P? Summary: The paper tries to provide some guideline in resolving the dilemma of determining whether one should or should not use P2P architecture for a particular application. A list of key parameters (budget, relevance, trust, rate of change, criticality, etc) is presented against which the need of an application can be evaluated. In brief, the authors note that low budget, importance of the content to many people, high trust between users, a low rate of changes in the system, and a low criticality index are the set of properties that tilt the case in favour of P2P applications. Comments: Accountability may be another factor that becomes important in determining applicability of P2P architecture. I am not sure whether it is contained in the five factors listed by the authors. Napster had to be shut down because it was held responsible for facilitating illegal fire sharing - but it is probably much harder to stop BitTorrents. Title: Scooped again Summary: This paper is an attempt to draw attention of researchers in the Grid Computing and the P2P community, showing them the striking similarity in the kind of research they are doing separately. The author examines the issues and fallacies that have kept the communities apart and urges those working on P2P applications to direct their attention to issues confronting Grid Computing. Comments: I found the paper unnecessarily biased towards Grid computing when it called for the people in the P2P community to consider research problems in the context of Grid or face extinction. If user popularity is any kind of determining factor, as claimed by the author at the very beginning, then P2P applications can sit comfortably for the time being without having any fear of being sidelined. Title: Service capacity of peer to peer networks Summary: This paper analyses the throughput and average delay of p2p systems and thus characterizes the service capacity of a p2p system. The focus is mainly on how the capacity of a P2P system behaves when a particular document in the system suddenly attracts heavy demand. The authors show that there can be 2 phases of demand. In the first phase, the the number of servers (that have a copy of the document) increases exponentially, thereby serving the pending requests quickly. In the second phase, the throughput stabilizes, even when peers leave the system at a considerably high rate. The authors conclude that performance of P2P systems will scale well with increase in offered load, and the degradation will be graceful even when peers leave the system at a high rate. Comments: The paper shows that multi-part downloading allows significant improvement in the performance as the file size increases. However, one other important thing may be the size of these "parts" or chunks. If they become too small then the overhead incurred may offset the performance improvement gained otherwise. It might be an interesting idea to find that "threshold" value. From: Maifi Khan [maifi.khan@gmail.com] Sent: Thursday, March 16, 2006 7:51 AM To: Indranil Gupta Subject: 598ig review 03/16 Title: Service capacity of peer to peer networks. Summary: This paper tries to formalize the service capacity for p2p file sharing application. They proposed some mathematical models and later tried to verify them using real data from BitTorrent. They identified two stages in the dynamics of p2p application. First stage is the transient flash crowd phase. In this stage service capacity is initially low but exponentially rise to meet the required capacity. This is because initially there are only one uploaders and many downloaders but as soon as a second node completes its download, it becomes uploader and in case of multi part download this rate is even faster. They model transient service capacity using a branching process. This is the reason of exponential growth in service capacity initially. In the second steady state phase, the overall service capacity shows slow increase and fluctuates based on demands. One interesting phenomenon is in the steady state burst of demand does not increase the service capacity exponentially but rather linearly. Discussion: One of the important aspects of p2p application that is suggested in this paper is that the dynamic service capacity offered by some nodes may not scale as effectively as it did in the initial stage. They argued that signaling overhead and credit scheme may be the reason for such reduced growth rate. Though this is not proved conclusively in the paper. Another aspect is the reduced throughput per peer as the number of peers grow after a certain level and they argued this as a effect of TCP flow control. One of the limitation of this paper is their analysis for the moment is limited for file sharing application which is dominant application in p2p network. But we have to be cautious to use this analysis for other types of application. Title: 2 P2P or Not 2 P2P Summary: This paper proposed a decision making process for deciding the suitability of p2p solution for a new application. Their axes of decision making process includes budget, resource relevance to participants, trust, rate of system change and criticality of application. According to them budget constraint is the most motivating factor for p2p solution. Next strong relevance of the resource is a motivating factor for success of p2p solution. High trust among participants make it suitable to deploy p2p solution easily. Foe mutually distrustful peers it is lot harder and may require some relaxing from application side to be able to apply p2p solution. Low rate of change as in digital preservation makes it more feasible to manage a p2p system even in case of mutual distrust. For a critical application it may argue for use a centralized authority and tighter control which is hard to achieve in p2p solution. They argued that based on these axes one may guide himself in deciding in favor or against of a p2p solution for a specific problem. Discussion: This paper is a more like philosophical paper and as they told in the paper that one of the purpose of this paper is to instigate discussion. It is hard to say whether their decision criterion is complete(most likely not). They analyzed some existing application to come up with the proposed axes. As p2p research becomes more matured and we see more application, we can expect these axes to change and also argument to vary a lot. Title: Scooped, Again Summary: This paper points out the similarities between grid computing and p2p structure and urges to combine the effort in these two areas to come up with better solution rather than working independently and end up with having a prevalent inefficient solution as HTTP and HTML for web. The motivation for p2p is mainly from research community where as Grid computing is motivated by real application demand by thousands of scientists. They feared that once the application is developed for grid, it will be very hard to change them later. They identified three misconceptions that kept the two communities separate 1. "The technical problems in Grid systems are different from those in p2p systems" , 2. "While the technical problems are similar, the architectures (physical topology, bandwidth availability and use, trust model, etc.) demand that the specific solutions be fundamentally different", 3. "Grid projects do not have the flexibility to try new algorithms/ideas because they have to get real work done. P2P research is all about this flexibility." Then they identified four categories of problems in both areas that can share solutions. Four categories are Formation (e.g. group membership protocol etc.) , Utilization (e.g. scheduling and handling contention, load balancing etc.), Coping with failure ( e.g. replication algorithm, security issues etc.) and Maintenance ( e.g. standardized API etc.). They also argued Anonymity in p2p community which still does not have a analog in Grid community to show that there are disjoint problems but may exist middle ground as pseudonymity. But the bottom line is there is a large set of overlapping problems that can be solved more efficiently jointly. Discussion: This paper tries to draw attention to a very important aspects of research and warned about that history may repeat itself if we are not careful right now. From their analysis it is evident that the overlapping problems and possibility of coming up with common solution is endless and require much effort. In my opinion the hard part is who will take the responsibility to merge these two areas. There will always be people working independently. May be there should be a common authority for these two areas to collaborate research and publish standards that can be followed by researchers in both community. From: Hussam Hasan Abu-Libdeh Sent: Thursday, March 16, 2006 2:54 AM To: Indranil Gupta Subject: 598ig review 03/16 The two papers that i'll review are: 1. Service Capacity of Peer to Peer Networks 2. The Capacity of Wireless Networks ---------------------------------------------- Service Capacity of Peer to Peer Networks: ---------------------------------------------- This paper modeled the service capacity of a P2P system in two regimes: One is the transient phase in which the system tries to catch up bursty demands (?ash crowd). Both the analytical model and trace measurements exhibit the exponential growth of service capacity during the transient phase. In the second regime, the steady state, we show that the service capacity of a P2P system will scale with and track the offered loads, so individual user’s performance will not degrade significantly. Moreover, both analysis and empirical data suggest that at higher offered loads and with cooperative users the system performance might improve. These characteristics are particularly desirable in systems designed to handle exceedingly bursty demands. This paper also discusses techniques that might help improve P2P performance. Multi-part combined with parallel uploading when properly optimized will generally improve system performance, particularly when peers exit the system at a high rate. A credit system might help provide peers incentives to share and thus improve performance. Yet even a simple credit system based on ‘short term’ history of peer’s upload download volume may limit the system’s capability to deal with ?ash crowds above and beyond an established steady state. Comments: ------------- I think this theoratical model was interesting and is also applicable for different applications such as p2p web caching, p2p file systems, and p2p DB system. --------------------------------------- The Capacity of Wireless Networks: --------------------------------------- Comments: This paper didn't focus on the mobility of different wireless nodes. So, how would you change or re-adapt the routing of traffic given the fact that nodes could change location or even fail ? I think it would be an interesting idea to ponder about. From: Long Hai Vu Sent: Thursday, March 16, 2006 12:04 AM To: Indranil Gupta Cc: Long Vu Subject: cs598ig review 03/16/2006 Name: Long Vu Class: CS598IG-Spring 06 REVIEWS 2 P2P or Not 2 P2P? In this paper authors try to find a way to judge how suitable a P2P architecture is for a specific P2P problem. They first start by introducing several main characteristics of a typical distributed system. Then, they review the range of proposed P2P systems. Finally, they use decision tree approach (from Machine Learning) to judge a path in a tree is success or failure. In essence, they convert the problem of justifying a P2P architecture into a classification problem in Machine Learning and then use one of the most classical method called decision tree to solve. To do this, they define a set of attributes (features, or characteristics) to identify a particular architecture. Each P2P architecture then is a vector whose elements are these attributes. Each element has a truth value (true or false). Discussion The proposed characteristics have only 2 values (high and low). What could they do if they are multiple value characteristics or they are numbers? They do not use any formal heuristic such as Information Gain or Gain ratio to construct the tree. Instead, they rank the characteristics based on their own justification. According to them, “relevance” is more important than “truth”, this seems not necessarily true with the system which focuses on security purpose. Therefore, their proposed order of characteristics could not be applied for all domains. Scooped, Again This is a very interesting paper. In this paper, authors compare and contrast P2P and Grid approach. They claim that P2P community should make use of infrastructures that Grid community has created so far. They encourage P2P community to incorporate Grid community’s achievements for P2P works. They first start by introducing Grid and P2P in term of definition, goal, and short history. Then, they compare the two approaches and point out their similarities and differences. One of the most important differences between these two is flexibility. While P2P is very flexible, nodes’ churn is rare in Grid. Grid is mainly for computation while P2P is for sharing, even though nowadays they have some overlaps. In fact, there are many techniques in Grid could be applied in P2P system such as nodes join and leave systems, membership protocol, resource discovery, security issues, etc. According to these authors, P2P has no standard and the later version usually obsoletes the previous ones while Grid has formal standard and later version could inherit and enhance the former ones. Finally, author call to an action from P2P community. That is, P2P community should utilize fruits created by Grid community. Service Capacity of Peer to Peer network In this paper authors study service capacity of peer to peer file sharing applications. The service capacity discussed in this paper depends on several factors such as data management, peer selection, admission and scheduling policy, and traffic. These factors are interrelated in complex ways. Main contributions: - Study the performance characteristics of 2nd generation P2P applications via both trace measurement and analysis. They consider performance as a function of time at the granularity of a single P2P transfer session. - Model the transient service capacity of a P2P network by a branching process. Relying on this model, they try to maximize the service capacity growth rate. - The second regime is the steady state. In this regime, they show that the service capacity of a P2P system could scale with and track the offered loads. Thus, although an individual user make bad performance, the overall performance of the system is not effected too much. Evaluation show that the higher the users corporate with each other, the better system performance is. From: Muyuan Wang [mwang2@uiuc.edu] Sent: Wednesday, March 15, 2006 11:46 PM To: Indranil Gupta Subject: 598ig review 03/16 2 P2P or Not 2 P2P? Summary: This paper present a heuristic 'decision tree' to judge whether an application is suitable for P2P. The paper first raises 3 major ideal properties for an application to adopt P2P technique: Self-organizing, Symmetric, and Decentralized Control. Then, 5 major criterion are given, which includes Budget, Resource relevance to participants, Trust, and Rate of system change, and Criticality. Several candidate problems are analyzed and after that, a decision tree is building using the above criterion, and corresponding applications are categorized accordingly. Finally they conclude that the characteristics that motivate a P2P solution are limited budget, high relevance of the resource, high trust between nodes, a low rate of system change, and a low criticality of the solution, among which budget is the most important factor, thus is put at the root of the decision tree. Comments: I agree that the 5 factors are all highly related to our decision of whether to use P2P, however, the paper did not provide a 'quantitative' approach to evaluate the relative importance of each factor. In other words, how can this decision tree be built? How can we choose the appropriate attributes to partition? Why not build the tree in other shape, with same attributes? If something like the 'entropy' measurements in decision tree building can be provided, the paper would be even better. Scooped, Again Summary: This paper compares the P2P approach and the Grid Computing. P2P focuses on distributed systems that take advantage of resources scattered throughout the Internet, while Grid concerns more about resource sharing, selection, and aggregation. P2P concerns fault-tolerance, scalability and availability, while Grid considers performance and QoS requirements more. They both consider storage utilization... Finally, the authors conclude that, if P2P research does not address the Grid, it may eventually be sidelined by defacto distributed algorithms that are less elegant but were used to solve Grid problems. Comments: looking at the P2P research today, it seems that P2P and Grid are still developing independently, and P2P is still powerful. Service Capacity of Peer to Peer Networks Summary: The paper considers the service capacity of P2P networks. First, the evolution of P2P system's throughput is divided into 2 regimes: 'transient state' and 'steady state'. The author argues that the concerns in the two phases are different. In the transient state regime, one might wish to assess how quickly the P2P network starting with a limited number of servers can grow the service capacity to meet the load need, thus is server constrained; while in the steady state, one might be more interested in the average throughput performance, which will be demand constrained. They model the transient service capacity of a P2P network by a branching process, and consider the optimization, based on this model. They also propose a simple steady state model to capture the impact of peer departures or document deletions on the stationary service capacity. Based on these models, they investigated BT system, and analyzed the traces obtained in the experiments. Comments: It is really a 'theoretical' paper, which uses markov chain model and branch process, such fancy things. Mathematic models plus experiments, I think it is a good way to make the paper seems very solid. The only question is, whether the model is really useful? Though the authors provide the motivation of modeling the system, they do not provide enough reason why it should be like this. If there's some theoretical or experimental to support this, it would be better. From: Raghu Kiran Ganti Sent: Wednesday, March 15, 2006 10:43 PM To: Indranil Gupta Subject: 598ig review 03/16 Reviews for A step back, March 16, 2006 ---------------------------------------- Paper title: 2 P2P or Not 2 P2P Summaries/Main ideas: --------------------- This paper presents a heuristic decision tree that designers can use to judge how suitable a P2P solution might be for a particular problem. The authors identify the characteristic axes which are used to make a decision of P2P or not P2P. These criteria include: 1. Budget - amount of money to spend on building a solution to the problem 2. Resource relevance to participants 3. Trust - how trust worthy is the environment 4. Rate of system change - Differing requirements for timeliness and consistency 5. Criticality The authors then analyze a variety of problems with proposed P2P solutions to determine if the above characteristics are exhibited. Comments: --------- 1. The paper identifies several characteristic axes for answering the question of whether P2P is relevant to a solve a given problem, but it is not clear that these characteristics will alone suffice. 2. It is a subjective opinion of the authors, there are other decision graphs which are not presented here (putting it in the authors' words). ******************************************** Paper title: Scooped, Again Summaries/Main ideas: --------------------- This paper addresses the issues of how P2P research can affect grid, and talks of the relevant issues (and the non-relevant) with respect to the grid. The authors believe that the difficulty of parallel development of the grid and the P2P systems is that the concepts of P2P, the grid will most likely develop in such a way that it is incompatible with the strong points of P2P systems. Three main fallacies that have kept the grid and P2P communities separate are identified in this paper, they are: 1. The technical problems in grid systems are different from those in p2p systems. 2. Although the technical problems are similar, the architectures demand that the specific solutions be fundamentally different. 3. Grid projects do not have the flexibility to try new algorithms/ideas because they have to get real work done, and on the other hand P2P research is all about flexibility. Although, there are differences between grid and p2p systems, some of the technical problems identified by the authors which are shared among the grid and p2p systems are: 1. Topology formation and peer discovery 2. Resource utilization 3. Coping with failure 4. Maintenance Comments: --------- 1. An interesting problem has been addressed by the authors, viz. how does P2P research affect grid systems? Although closely related, the research areas target different communities and there has not been much combined research in these two fields! ******************************************** Paper title: Service Capacity of Peer to Peer Networks Summaries/Main ideas: --------------------- This paper addresses the 'service capacity' of peer to peer file sharing applications. The authors view the service capacity from two regimes: the transient and the steady state. In the transient state, the network grows starting with a limited number of servers and the user is interested in the time it takes to grow. Whereas in the steady state regime, the average throughput/delay performance is seen by the typical user. The authors identify that the service capacity depends on several factors, which include data management, peer selection, admission and scheduling policy, and traffic. The main contributions of this paper are that they measure and study the performance characteristics of the P2P applications (BitTorrent in this case). They model performance as a function of time at the granularity of a single P2P transfer session. They use a branching process to model the transient service capacity and a simple Markov chain based model for the steady-state service capacity. The authors present a simple deterministic model initially, which leads to an exponential growth in the service capacity (through replication and later using multi-part downloads). But, this model does not account for the congestion in the network. To this end, they present a stochastic model that capture variability in serving peers due to congestion as well as aspects such as speed at which peers leave the system. The main idea behind this model is a branching process. In this model, each peer serves one other peer at a time. Several variations are presented to improve the model's accuracy, like modeling under parallel uploads, uncooperative peers under parallel uploading scenarios. Comments: --------- 1. It is an interesting piece of work, which models the capacity of P2P systems. 2. The paper at times is complicated to follow, and not very clear! From: Juan Jose Jaramillo Jimenez Sent: Wednesday, March 15, 2006 3:38 PM To: 'Indranil Gupta' Subject: 598ig review 03/16 1 Service Capacity of Peer to Peer Networks 1.1 Summary The paper tackles the problem of how to analyze Internet 'flash crowds', where unexpected rapid increases in the demand of particular objects occur. The authors suggest that the service capacity of P2P networks should be viewed in two regimes: the transient and steady state. In the transient state, a popular file has a limited number of servers storing it, but this number increases rapidly because peers that have already downloaded the file can now act as servers to help mitigate the download burden in the previous nodes. In the steady state, the number of servers becomes large enough to serve the intensity of requests, so performance from a peer's perspective is more important. During the transient state it is stated that it is interesting to assess how quickly a P2P network can grow its service capacity to meet the load associated with a bust of demands. In the steady state they say that it is important to find the average throughput/delay performance a given node receives. Because of this, the paper studies and models the performance characteristics of second generation P2P applications, focusing on system performance in the transient regime and user experience in the steady state. It is shown that during the transient state the growth of service capacity is exponential, while in steady state the service capacity of the P2P system scales with the offered loads, so user performance does not degrade significantly. It was also studied various techniques to help improve P2P performance, and it was found that although credit systems might help provide peer incentives to cooperate, those systems may limit the system's capability to deal with flash crowds above and beyond an established steady state. 1.2 Comments This is the first paper I read about a theoretical model to characterize the performance characteristics of a P2P system. The model focuses on analyzing flash crowds, and I think that this problem can also be interesting for cooperative web caching, since this kind of technology also suffers from the same problem. The similarities between web caching and pure P2P file sharing networks make it interesting to use the results in such scenarios. 2 The Capacity of Wireless Networks 2.1 Summary This paper analyzes the capacity of infrastuctureless multihop wireless networks for two scenarios: a. Arbitrary networks: n nodes are arbitrarily located in a disk of unit area in the plane. The traffic patterns and transmission ranges are optimally chosen to maximize network capacity. In this case, it has been found that the obtainable node throughput is \Theta(W/ \sqrt{n}) (where W is the bit rate). b. Random networks: n identical randomly located nodes, using a fixed range. The node throughput is \Theta(W/ \sqrt{n \log n}) The results also show that breaking the channel into subchannels does not help to improve network capacity. The analysis is performed using two different transmission models: (1) the protocol model, where a transmission is successful if and only if there is no other transmitter within a circular region around the receiver; (2) the physical model, where a transmission is successful if and only if the SINR is bigger than a threshold. The main implications of this result is that the throughput goes to zero when the number of nodes increases, so the authors suggest that designers should focus on building wireless networks with small sizes. However, a way to increase throughput for large networks is decreasing node density, so the interference between concurrent transmissions is lower. 2.2 Comments This landmark paper was the first one to characterize the capacity of wireless infrastructureless multihop networks. It presents a tight bound for network capacity, and presents valuable insights for designers to consider.