From: Justin Tulloss [jmtulloss@gmail.com] Sent: Thursday, April 17, 2008 9:24 AM To: Gupta, Indranil Subject: 525 reviews 04/17 On the scale and performance of cooperative Web proxy caching This paper did an in-depth analysis on the practicality of cooperative web caches. They analyzed two large scale networks and broke one of those networks into small scale networks to study the web request behaviors of users across large, medium, and small networks. Their conclusions were depressing: there is very little benefit to developing cooperative caches. The authors found that cooperative caching is only beneficial on small networks in which a centralized cache would perform just fine. This seems to hold true for short and extended cache life cycles. The main proponent of cache performance is the cacheability of documents, not the caching hardware itself. Pros: - A compelling argument - Lots of data makes a strong argument - Thorough. Address many cases in which they may be mistaken. Cons: - I didn't understand how they got the optimal caching numbers - I would like it if they ran the same tests on a few more networks to demonstrate that these are not isolated results - Ignored benefits of cooperative caching beyond performance Squirrel: A decentralized peer-to-peer web cache This paper details a cooperative web caching system based on the Pastry overlay. This overlay gives them a self-organizing, efficient, fault-tolerant architecture to build a infrastructure free cache. It is intended for LANs with high geographic locality, which is the only way you're going to have a caching scheme be worth the effort. The authors demonstrate that this caching scheme performs almost exactly the same as a centralized cache, but without the expensive infrastructure. Pros: - Good idea - Simulation shows that it might work Cons: - Simulation only, I'd like to see some real-world data. - Has to run on a LAN anyway, what LAN of the size that would benefit from this doesn't have some infrastrucure? - Web caching's benefits are slipping away as web pages become more and more dynamic. From: Chandrasekar Ramachandran [cramach2@uiuc.edu] Sent: Thursday, April 17, 2008 9:09 AM To: indy@cs.uiuc.edu Subject: 525 review 04/17 1.Squirrel: a decentralized, peer-to-peer web cache, S. Iyer et al, PODC 2002. This paper proposes a simple, scalable and decentralized peer to peer web caching technique. By efficient resource management, and a system which uses some of the features of the pastry service, the technique demonstrates that simple coordination and cooperation among nodes achieves many of the tasks specified without the need for costly hardware. The authors have kept a typical corporate network as a target network in the design of the technique, and though this is not very large it seems sufficient for its thorough evaluation and study. Pros/Cons: 1. The authors account for node failure and the subsequent re-routing in their study which is very important and is necessary to find out the robustness of the technique. 2. By evaluating using the choice of storage location, the authors have shown the inherent flexibility of use of the technique in terms of the different networks it can be built upon. 3. Route caching on the cambridge workload seems to have a higher hit rate than in redmond but the authors have not explained its reasons sufficiently. 2.Caching technologies for web applications, C. Mohan (IBM), VLDB 2001 In this presentation, the author gives an overview of caching techniques ranging from issues such as caching models to some popular vendors to the DBCache Project at IBM Almaden. The author gives some design choices like that in the middle tier, and explains the choice of the model and the process of designing tables, and executing queries in them.The most interesting part here seems to be the mobile and web caching, and the diagram of the architecture was comprehensive in that it included every possible device into consideration. The web caching as described seems to provide a flexibility in the placement and choice of the databases. Pros/Cons: 1. The author has given a thorough survey of the types of caching in existence at that time, and also the reasons and design choices are pretty clear. 2. Some of the main points in the design choice of the data model seem to be the simplicity in design, a distributed processing and handling tables, views and queries. From: Rahul Malik [rmalik4@uiuc.edu] Sent: Thursday, April 17, 2008 8:49 AM To: Gupta, Indranil Subject: 525 review 04/17 Squirrel: A decentralized peer-to-peer web cache SUMMARY: In this paper, authors have presented a scheme using which web browsers can achieve a decentralized web cache by utilizing their local web cache resources. Thus, there is no need for extra hardware which will act as centralized cache for the whole system. Squirrel is build on the top of Pastry, which is used for decentralized object location for peer-to-peer systems as well as for identifying and routing to nodes and caching their copies. The key idea of the scheme is that when the web browsers issues get request and the object is cacheable, the browser first of all checks in its local cache. If it does not find a copy of the object there, then it tries to locate a copy of the object at some other node. It uses SHA-1 hash to obtain the location of object. Once the home node is found, two different approaches are proposed after this point. They differ based on whether the home node actually stores the object or it just maintains a directory of information about a small set of! n! odes that store the object. These two approaches actually represent two extremes in the design space. In terms of evaluation, they have compared the two schemes as well as with a centralized caching scheme. They used a lot of different parameters to evaluate the schemes including external bandwidth and hit ratio, latency, load on each node and so on. PROS: One of the advantages of the scheme is that by using a centralized scheme, there is no need for a centralized web cache. As a result, this leads to reduction in hardware as well as administration costs. Also, a centralized cache is a centralized point of bottleneck. In a distributed scheme, such a bottleneck does not happen. Also, the performance of this scheme is comparable to that of a centralized scheme. Also, it is very resilient to node failures, node joining and leaving the system. Thus, it makes the system very robust and easy to use. CONS: One of the problems with a centralized scheme and that which has not been addressed in the paper is that of privacy and security of individual users. As a result of decentralized schemes, individual browsing information is disseminated to the system. Also, there can be a malicious host machine which can pretend that it has got the web-pages although it may not have it. On the scale and performance of cooperative Web proxy caching SUMMARY: In this paper, authors study cooperative web proxy caching performance and study potential advantages and disadvantages of inter-proxy cooperation. Through the traces, they evaluate qualitatively the performance improvement between a large number of small organizations as well as between large organizations. From the traces, thy found that for small organizations, hit rate increases rapidly with population, while for large organizations, one is not able to achieve such a benefit. Also, proxy assignment is good for one geographical region. Next, they extend the results analytically to understand the steady-state performance of cooperative caching for large client populations and to speculate on caching performance in light of future trends. The developed model extends the results to a wider range of parameter values, including document popularity and document rate of change. Again from the results, they show that relatively small populations are able to achieve the most perfo! rm! ance benefits of cooperative caching. Finally, they compared various cooperative caching schemes with each another. PROS: This is a very extensive trace-based study of cooperative web caching that includes both a very large number of small organizations as well as large organizations. Through the results, they show that cooperative caching has performance benefits only within limited population bounds and does not have much improvement in performance for large populations. They also use the model to examine the implications of future trends. CONS: One of the disadvantages of the paper is that they have limited only to organizations, both of small as well as large size. However, they have not looked at the performance of proxy caching for local households. The kind of web access patterns that take place there are very different than those taking place in organizations that have been discussed in the paper. Also, I did not find their analytical model very expressive of actual model as it is very rigid. From: Alejandro Gutierrez [agutie01@gmail.com] Sent: Thursday, April 17, 2008 8:46 AM To: Gupta, Indranil Subject: 525 review 04/17 =============================================================== "On the scale and performance of cooperative Web proxy caching” Alec Wolman, Geoffrey M. Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin, Henry M. Levy Reviewed by: ALEJANDRO GUTIERREZ =============================================================== This paper from the Department of Computer Science and Engineering at the University of Washington provides a study of effectiveness of large scale cooperative caching. It discusses what client population sizes of caches benefit from cooperation. By analyzing traces of web requests as well as modeling requests, the authors were possible to show that although cache hit ratios increase in general with the number of clients, the most benefit of cache cooperation occurs with a relatively small number of clients. In seven-day traces significant benefits were achieved for client populations of a little more than 2,500, and in real deployments only marginal benefits can be achieved for organizations with more than 20000 clients. Based on these findings, the authors discuss the following issues: 1. There is not much point in expending effort in developing large scale cooperative caching schemes, as any scheme will do well with such small client populations. 2. Intra-organization traffic has as little homogeneity as inter-organization as well as clustering of clients by access does not yield much benefit either. 3. The performance of caching is highly limited by document cache ability and thus this is the most promising area for caching improvements. The authors also infer that most popular web objects are on average smaller than unpopular ones, which in turn lead to higher cache-hit-rates for objects than per byte. I personally would find it useful to be able to see a further discussion on what constitutes inter-cache latency and bandwidth that justify cooperation, in comparison to latencies to web servers, as well as how latencies affect the ideal client population size. =============================================================== "Caching Technologies for Web Applications” C. Mohan Reviewed by: ALEJANDRO GUTIERREZ =============================================================== This paper from the most important Research lab of IBM, the Almaden Research Center, presents a terrific summary of how caching works and how caching technologies can be targeted towards Web Applications taken into account the necessary requirements. On a side note, this paper was presented one day after a very important date worldwide. It does a great job talking about the motivation behind using caching technologies. What kind of improvements can be seen in many contexts such as processor caches in hardware; buffer pools in Database Management Systems; and even in web proxy servers. Then it goes targets the rest of the paper to e-commerce transactional/Database Applications, not Internet Search, etc. The paper then goes into detail about “what to specifically cache?” It is not only useful to know what to cache, but also where to cache the information. There is always the need of having a caching policy as well as authentication, Routing, etc. To my appreciation the paper introduced Middle-tier Cache requirements as well as Cache Refresh. How efficient is the caching technique in a Distributed System. The paper does also a great job presenting case studies of projects from big companies such as IBM Websphere Commerce Suite, Olympics, eBay. It concludes with a very interesting summary of challenges. It is fascinating how some of those challenges are not solved today, allowing for very cool research to be done in such interesting research areas. From: Riccardo Crepaldi [crepric@gmail.com] Sent: Thursday, April 17, 2008 2:49 AM To: Gupta, Indranil Subject: 525 review 04/17 Squirrel: A decentralized peer to peer web cache Squirrel is a distributed peer to peer application. The authors developed this system to decentralize the caching of web pages. Normally a web browser keeps a cache of the last URLs visited, if they are marked as cacheable, and if the user browse another time the same URL the cached copy is validated by checking if any update happen at the webserver and if not the page is fetched from the local storage instead that from the Internet. This process is decentralized in Squirrel. This solution is shown to be very useful in large intranets where the bandwidth available to connect local hosts is one or more orders of magnitude higher than the one available for the Internet. A Pastry DHT is implemented and cached objects are stored according to the Pastry protocol. When a client makes a request it first contact the host that is responsible for the object according to the hashing function. If the object does not have a cached copy of the object then it fetches it from the Internet. The paper propose two approaches for storage. In the first one the node that is responsible for an object (the home node) stores the cached copy of the object. In the second approach (directory) the node only stores information about who made the first request for an object, while the cached copy is stored at the client. The protocol relies on Pastry for fault tolerance and the node join and leaving procedures. The results show, unsurprisingly, how the distributed caching approach provides better performance in terms of storage and cooperation (a cached copy serves many clients). The home node approach performs better than the directory. This is true despite its simplicity. The directory approach is more complicated, most of all when it has to deal with failures. I liked the paper, it reads very well and the results are presented in a very clear way, with lot of clarifications. The system is powerful and efficient. However this approach is suitable only in large networks where nodes are not behind a firewall. In this case security is a big issue, and the distributed cache seems too easy to modify. A malicious node could disseminate wrong information pretending to forward a cached object. This is probably why a single, secure and failure-free proxy server could be a valid alternative. From: fariba.mahboobe.khan@gmail.com on behalf of Fariba Khan [fkhan2@uiuc.edu] Sent: Thursday, April 17, 2008 2:18 AM To: Gupta, Indranil Subject: 525 review 04/17 Squirrel: a decentralized, peer-to-peer web cache, Sitaram Iyer Ant Rowstron Peter Druschel, PODC 2002 Authors propose p2p web caching in a corporate LAN which would replace traditional proxy server. Squirrel works on top of Pastry. Benefit of having a p2p web-cache is that no additional hardware is needed, no administrative overhead (who maintains squirrel, pastry?), scales with growth of client and there is no single point of failure. They propose two methods to implement squirrel on pastry: home-store and directory. In home-store object (web-page) is stored in the node that object-id hashes into (home node). In directory home node only stores pointer to the clients that have stored the object recently. They simulate the protocol on MSR corporate traces from Redmond and Cambridge. Results show that with as less as 10 MB per-node storage external bandwidth required (need to go out of LAN for page) is close to centralized cache. Homestore performs better as it has natural load balancing through pastry. Hopcounts are moderate (4-6). Discussion: Good overview of web-cache and pastry. Is there a way to make directory work better? Could the small size of stored pointers (4) be the reason for not-so-good performance? Is there a more direct way to compare centralized vs p2p web-cache. The reasons stated in the paper doesnt seem good enough to me. Hardware is cheap and admins will be needed. I would prefer a performance comparison, but am not sure what could be analyzed. -- Fariba Khan PhD candidate Illinois Security Lab University of Illinois http://www.cs.uiuc.edu/homes/fkhan2 From: dkassa2@uiuc.edu Sent: Thursday, April 17, 2008 1:48 AM To: indy@cs.uiuc.edu Subject: 525 review 17/04 ============================================================================== Review 25: Paper title: On the Scale and Performance of Cooperative Web Proxy Caching The paper presents a scheme where proxy caches cooperate to scale the client population and thereby increase the hit rate to reduce latency, network utilization and server load. The paper uses multi-organization traces to evaluate cooperative proxy caching at small and medium scales. It also uses analytic modelling to evaluate cooperative caching at scales beyond those available in the traces. A simultaneous trace of 200 diverse client organizations in University of Washington (UW) was used in the study. Treating each UW organization as an independent company, the work analyses how much Web document reuse was there between these organizations by placing a proxy cache in front of each organization. It also analyses the benefit of cooperative caching among these 200 proxies. The study shows that average ideal hit rate increases from 43% to 69% with cooperative caching by assuming infinite storage, ignoring cacheability and expirations. Besides the paper shows that average cacheable hit rate increases from 20% to 41% with (perfect) cooperative caching and hit rate increases logarithmically with client population. The paper concludes saying that the largest benefit is achieved with small populations (up to 2K-5K clients) and limited benefit of cooperation when the UW & Microsoft populations are combined. I am not sure how the approach handles security and privacy issues. ============================================================================= Review 26: Paper title: Squirrel: A decentralized peer-to-peer web cache The paper presents a decentralized peer-to-peer web cache where every computer in the intranet takes a little part of the web caches job. This scheme is a decentralized caching where desktops cooperate in a peer-to-peer fashion and that there is mutual sharing between hosts. Hence hosts browse and cache. This scheme has the advantage that no additional hardware is used. More users in the system means more resources and hence scales automatically. It also self organizes and is easy to deploy. The paper also discusses ways of mapping Squirrel onto Pastry. =========================================================================== From: ysarwar@gmail.com on behalf of Yusuf Sarwar [mduddin2@uiuc.edu] Sent: Thursday, April 17, 2008 1:13 AM To: Gupta, Indranil Subject: 525 review 4/17 Squirrel: A decentralized peer-to-peer web caches S Iyer, A Rowstron and P Druschel The paper presents a peer-to-peer web cache for the Internet. Usually when browser gets an URL, it first its local cache, whether it has a fresh copy this file. If it is there, it's immediately displayed in the browser. And if doesn't have any cache or it can be stale, it makes a query to the original web server. The authors here try to build a peer-to-peer cache where pages will be stored in other peers, and cache missed node will contact other node for a specific page before they go to the original server. Squirrel implementation is simple. Browser doesn't have any caches any more (cache option is disabled in the browser), rather a squirrel proxy runs on top of the same machine. Whenever an URL is entered, the browser first forwards the request to the local squirrel proxy to check whether its cache stores the page. If yes, the page is fetched from the cache. Otherwise, the Squirrel proxy forwards the URL to another peer that is the closest in match with a key obtained by hashing the URL. This node is called home node. Two types of cache management are proposed, home-node and directory. When a local cache miss happens, it forwards the request to the home node over the pastry routing substrate. Another scheme is called directory, where home peer keeps pointers to the nodes that have accessed a certain object in it. These nodes are called delegates. When a request for one of the objects is received, the request is forwarded to the one of these nodes in the hope that they, in turn, potentially have those objects cached. A directory is preserved to trace where objects are cached, and requests are forwarded accordingly. Web caches require lots of space, so it can be wondered how much disk space each peer can dedicate. Another thing is latency. The home node is just a hashed key of the URL, so, it does not provide any latency efficient delivery of cached pages. And sometime routing requests to home nodes produces nothing but the cache miss at the end, making a wastage of all traffic. ============================================================ From: Daniel Rebolledo Samper [rebolledodaniel@gmail.com] Sent: Wednesday, April 16, 2008 11:00 PM To: Gupta, Indranil Subject: 525 review 04/17 SQUIRREL: A DECENTRALIZED, PEER-TO-PEER WEB CACHE This paper presents a peer-to-peer web cache called Squirrel built on top of Pastry: each node in the system runs an instance of Squirrel, and we assume that there is no distinction between the Squirrel cache and the node's cache. When a node requests a web page, it queries the Squirrel proxy on its machine. It, in turn, will first query the machine's local cache and, if it does not find a copy of the webpage, it will query another node in the system. Squirrel maps URLs to Pastry keys by calculating their SHA-1 checksum. The values of the Pastry overlay may be the actual web pages or pointers to the machines that store them. We call the first approach "home-store", and the second, "Directory". In the former, all queries for a given URL go through its home node, that is, the node responsible for the corresponding key, and only it may contact the origin server. This seemingly simple technique opens up the possibility In the latter approach, each node stores, for each URL it is responsible for, a small number of pointers to the last nodes that requested it. We call these "delegates". Subsequent requests are redirected to one of these nodes. Because of Pastry's self-healing properties, and since loss of cached data can be tolerated, Squirrel is able to tolerate nodes leaving and departing. However, the paper does not discuss the effects of high rates of churn (which would probably not be applicable to an enterprise setting anyway). Trace-driven experiments reveal that the home-store approach is more effective in reducing external bandwidth than the directory approach for a given per-node cache size. Indeed, this is because nodes that visit many different items fill up their caches and lose files that other nodes could benefit from. Latencies are typically small, thanks to the nature of the network the authors assume. Predictably, the home-store approach imposes a higher load on nodes. Peer-to-peer web caching seems like a very contentious proposition in today's world: first what are the incentives for end-users to share their precious upload bandwidth to share web pages, when they would probably much rather use it to boost their rating or download speed on BitTorrent? The article avoids this discussion by taking the example of a single administrative domain (e.g. a large corporation) with a local network. ON THE SCALE AND PERFORMANCE OF COOPERATIVE WEB PROXY CACHING This paper studies the advantages and drawbacks of inter-proxy cooperation on two scales: when each cache serves a small population, and when they serve larger populations. The authors use simultaneous traces from the University of Washington and Microsoft Corporation. They also present a model of web access that allows them to extrapolate the results to much larger populations. The first part of the paper studies the traces, and the simulations are optimistic in the sense that the simulated caches have infinite size and objects do not expire. Predictably, their simulations show log-like decreasing marginal increases in the hit rate and the byte hit rate (hit rate weighed by file size) when the population increases – in other words, when we combine caches-. Cache cooperation does not significantly improve latency either. The paper then studies the locality of requests and the impact of combining the caches of people with different interests. To do this, the authors replay the UW log as if there was one proxy per department. They conclude that the requests have little actual locality. They also tried to cluster the users by optimizing intra-cluster sharing, but the difference in average hit rate is still less than 5% relative to random clusters. Finally, they conclude that combining caches is only worthwhile if they serve relatively few people. The authors then present a model of web access: pages' popularity follows a Zipf distribution. Users act independently of each-other and the number of requests follows a Poisson process. The number of changes to a web page also follows a Poisson process that depends on whether a page is popular or unpopular. Documents are cacheable with some independent probability (this is questionable in today's "web 2.0" world), etc. The upshot is that they can obtain formal expressions for the (asymptotic) hit rate, last-byte latency and bandwidth savings. They conclude that there is a transition range (around 10^4-10^5 users) where cache cooperation greatly improves performance. I would criticize that some of the main results of the first part – that is, diminishing marginal returns – stem from the fact that the hit rate increases logarithmically with the number of clients and the related work extensively proved this. Still, it's nice to have even more empirical validation of this behavior. From: Qiyan Wang [qwang26@uiuc.edu] Sent: Wednesday, April 16, 2008 10:35 PM To: Gupta, Indranil Subject: CS525 review 4/17 Caching 4/17 Qiyan Wang qwang26@uiuc.edu On the scale and performance of cooperative web proxy caching Summary: Cooperative caching is used to improve the performance of file and virtual memory systems in a high-speed, local-area network environment, based on the fact that network transfer time is much smaller than the disk access time required to service a miss. In this paper, authors argument the question: whether multiple proxies should cooperate with each other in order to increase total client population, improve hit ratios, and reduce document-access latency; whether such cooperative proxy caching is a useful architecture for improving performance depends on a number of factors. This paper describes two approaches to exploring the limits and potentials of cooperative proxy catching. One is trace-based analysis, where they collect and analyze traces from two environments. The other one is an analytic model of Web behavior that extends beyond the limits of the trace results. Their results show the benefits of cooperative caching among collections of small organizations. However, they show that cooperative caching is unlikely to have significant benefits for larger organizations or populations. They claim that there is little point in designing highly scalable cooperative-caching schemes, and all reasonable schemes will have similar performance in the low-end population range where cooperative caching works. Pros: Show there is little point in continuing to expend effort on the design and evaluation of highly scalable, cooperative-caching schemes. Propose an approach to evaluate the benefit of cooperative caching in different environments. Propose a model to examine the implication of future trends in web-access behavior and traffic. From: marefin2@uiuc.edu Sent: Wednesday, April 16, 2008 10:20 PM To: Gupta, Indranil Subject: 525 review 04/17 Review: Squirrel: A Decentralized Peer-to-Peer Web Cache S. Iyer, A. Rowstron, and P. Druschel This paper presents a decentralized web-caching technique called Squirrel. The main idea of this approach is to use the web cache of all connected peers in the overlay to share among all the nodes in the overlay. Thus each node performs web browsing as well as web caching. There is no central caching technique in this design. They propose two approaches – one is home-store approach and another is directory approach. The underlying network topology is same as Pastry. Nodes and web objects are given ids using the hash function. In the home-store approach, objects are stored in the node with the numerically closest nodeId (called the home node). If the client cache does not have the fresh copy, or may have a stale copy or none at all, it uses cGET ot GET request to the home node. If the home node has the fresh copy, it directly replies with the copy or not-modified message as appropriate. If the home node finds a stale copy in the cache or there is a cache miss, it issues cGET or GET request to the origin server respectively. If the origin server replies with a not-modified message or a cacheable object, the home node revalidates the local copy or stores the object respectively and then forwards the suitable response to the client. In directory approach, the home node keeps the list of the other nodes (called delegate node) that has the copy of the object rather than storing the object itself. The home node also maintains all the metadata information of the object to validate or decide on it. For non-cacheable objects, the requests are forwarded directly to the original server. Home node maintains the same up to date copy to all delegate nodes. Requests can be served by the home node (if Etag matches and not-modified), by the delegate nodes or by the original server. The client after getting the updated object becomes the new delegate node and informs all the metadata to the home node (if necessary). Some of the mentionable points: · To improve the performance, when a new node joins, the two neighboring nodes (with node id) exchange their objects or directories with the new node. It will improve the performance in the sense that any request that ends up in the new node will be eventually be succeeded but this movement in home-store approach may be expensive due to the movement of the object. · The modified protocol shows that each node limits the number of connection to itself and replicates the objects to the nearest node ids in that case. · Squirrel has the advantages of being inexpensive, highly scalable, and resilient to node failures and it requires less administrative overhead. · From the external bandwidth vs per-node cache size graph, it is clear that home-store approach performs better than the directory approach. This is because; the home-store method uses hash function to distribute the load uniformly, while the directory method, stores the objects in the requesting client node. Thus cache eviction from the heavily browsing client nodes therefore lead to increased cache misses and higher external bandwidth use, as compared to the home-store approach. · For node failure, it just forgets the previous storage. But this approach can be improved much by keeping replication of the objects in the nodes with the closest nodeId. They have not talked about this replication in this paper. · In directory approach, each home node stores up to K pointer to nodes that have most recently accessed the object. There is no indication of the paper about how to fix this value and why they are using 4 in as the example. · As Squirrel merges browser cache and squirrel cache into one, it may happen that some local cached pages (using by the local client) are evicted due to the global cache storage for some other clients in the network, even though the pages are not too old. This seems unfair to me. From: Mirko Montanari [mirko.montanari@gmail.com] on behalf of Mirko Montanari [mmontan2@uiuc.edu] Sent: Wednesday, April 16, 2008 9:48 PM To: Gupta, Indranil Subject: 525 review 04/17 Squirrel: A decentralized peer-to-peer web cache Siteram Iyer (Cambridge), Antony Rowstron and Peter Druschel (Microsoft) This paper describes a distributed implementation of a Web Proxy server. The system is supposed to be used within a enterprise network: instead of building a centralized proxy server that respond to the requests of all the machines within the network, the authors propose to use a P2P system to distribute the load of storing pages and responding to requests among all the machines in the network. The system that the author proposes in based on Pastry. The authors describe two approaches: a simple approach and a more complicated approach. Experimental results shows that the simple approach gives better performance in almost all cases. In this simple approach a client issue a request for an URL. The request goes to the local Squirrel server that compute HASH(URL), interpreting the result as a nodeid A for the pastry system. The request is forwarded to A. If A does not have a copy of the page, it fetches from the original website and it stores it locally and gives it to the requesting node. PROS: - The simple approach is very easy to implement. The authors also propose a solution that can be used to reduce the load of popular pages stored on a single node. CONS: - The authors do not address the problem of security at all. A node can respond to a request with a fake page containing whatever information it wants to give. - The authors probably think that within an enterprise network the security problem do not need to get addresses, but in big network of big organization (e.g. universities) it could be a problem. From: Hengzhi Zhong [hzhong@uiuc.edu] Sent: Wednesday, April 16, 2008 8:17 PM To: Gupta, Indranil Subject: 525 review 04/17 Squirrel: A decentralized peer-to-peer web cache Summary: This paper presents Squirrel, a decentralized p2p web cache. The web browser on each desktop is treated as a local cache, and together they form the web cache. The setting of the problem is a p2p approach to web caching in a corporate LAN environment, located in a single geographical region. There are two schemes in Squirrel. In the home-store scheme, Squirrel stores objects both at the client caches and at its home node. In the directory scheme, the home node for an object remembers a directory of up to K pointers to nodes that have most recently accessed the object. Both schemes need to cope with membership issues. The experiments measure the performance between home-store, directory, and centralized web cache. The measures are latency, external bandwidth, hit ratio, load and storage per node, and fault tolerance. Two different traces are used for evaluation. In terms of latency, Squirrel is small on a cache hit, but it overshadowed by the latency to the origin server. In terms of load per node, there is a bigger range of variations for the directory scheme than in home-store. Overall, the home-store scheme fares better than the directory scheme. Pros: 1. This paper proposes a p2p approach to web cache. It uses the web browser on each desktop to do this. Cons: 1. How to set K? Why is K 4? This is not clear. Would it matter if K is larger? 2. This paper should include some experiments on the effects of frequent membership changes. 3. How to do load balancing in p2p web cache?