CS 241: System Programming

Machine Problem 5
Persistence (Network Servers)

Deadline: Monday, 1 May, 4:00pm
(electronic submission)

Solution: Makefile, persister.c, all given code (meta.c)

Description Let's have fun with networks!

Our goal is to extend MP3 into a web-based peer-to-peer file searcher. We've provided a slightly modified MP3 solution that you will need to add web server capabilities and UDP searching to. The peer-to-peer network is a star network, with a simple directory-based server in the middle. You will instruct your solution to connect to the server we have written, and therethrough connect to the other solutions. We have even written a solution so you can test yours without waiting for other students to complete the assignment.

Be sure to read the sections and examples we point out here, and check back for clarifications (listed at the bottom).

Following R&R Chapter 18, you can write a TCP server that behaves much like Program 18.1. Change a few lines to get a request sent by a web browser, add a few lines to send a response, and you'll be able to connect to your server through a web browser (Internet Explorer, Firefox, etc). You only need to support two pages (requests): one that shows a search box, and a second that searches your index (part 1) and everyone elses (part 3).

Following R&R Chapter 20, you can write a UDP request-reply server (and client) that works much like Programs 20.3 and 20.4. We have defined a protocol to follow, and using it, you will be able to access our peer-to-peer network of MP5 solutions.

When all is complete you will be able to use your web interface to search through all of the other solutions. Communication will start and end at your web browser, following this Figure.

That basically does it. If your solution can communicate with and search the peer-to-peer network, and can be searched by our solution, it will receive a high grade. Be sure to remember concepts you've learned throughout this course, such as pthreads, synchronization primitives, and general good programming principles. Also remember that some assumptions we had made before are no longer valid (program is persistent, so you can't just waste memory, you need to properly free it).

Part 1: Web Server

A web server is nothing more than a TCP server that takes a well-formed HTTP request (header) and replies with a well-formed HTTP response. In helper.h, you will find two functions, getHTTPRequest and sendHTTPReply. When you use getHTTPRequest, a struct Request and a list of struct HeaderList's will be allocated to you. The request can come in the form of a GET or HEAD, and is stored in the field req; pass this value into sendHTTPReply in order to give the proper response. Google and RFC 1945 are good resources to learn more about the HTTP protocol.

We will talk in-depth about UICI in discussion section 11. After understanding that, you should be able to modify Program 18.1 in R&R to work as your web server.

Start with the given code, persister.c, and write code that spawns a thread that will be the web server. Open the port passed in at the command-line using u_open. Go into an infinite loop with u_accept. When you get a new connection, you can process it immediately or spawn a new thread for it (or anything more complex that works). To process it, call getHTTPRequest from the helper functions. Based on the resource field of the struct that is returned, do one of two things:
  1. If it's a search request, perform a local search and return the results using sendHTTPReply.
  2. For any other request, return a webpage that contains a search box and submit button.
In order to allow the user to search, we recommend using HTML forms and the GET method. For example, if you set the action field of the FORM tag to be "go", a search of "find all" in a text box with name "it" will be sent as a HTTP request of the address "/go?it=find+all" (assuming that the root directory was initially requested). And this resource request should be easily parseable to give the search terms (processString has been updated to convert +'s to spaces). We can make a text box using the INPUT tag, and setting its type field to "text". More on HTML and forms can be found at W3 Schools, and there are probably lots of examples reachable through Google.

It's simple to return a webpage, just create a character buffer and fill it with the data you want displayed as the response. Call the helper function sendHTTPReply with that character buffer (and the request type, stored in the req field of struct returned from getHTTPRequest), and the helper function will correctly format it and send it to the specified socket.

After servicing the request, call freeRequest on the old struct to properly free that memory, and then wait for another connection.

Part 2: Serving Requests from Other Servers

The first step to being able to search other servers is making your solution searchable by the metaserver. You can reference the Figure to understand better what piece we are making. Above in Part 1, we wrote steps 1 and 5. We are skipping steps 2 and 4 for now, and working on 3a and 3b.

We have developed a protocol much like the HTTP protocol you worked with above. You will need to follow it exactly in order to communicate with the metaserver. Functions relevant to this part are listed here:
  • HELLO [name] [queryport]
    Once your program has started, send this message to the metaserver. name is a command-line argument and will be a unique identifier for your program (I would suggest using a netid or something similar). queryport is another command-line argument which represents the UDP port that your solution will be listening on for connections. If a new connection is made using a name previously listed, the metaserver will update its records to point to the new connection.
  • KEEPALIVE [name] (optional)
    Our metaserver will only search clients that have made action with it in the last 15 minutes. For that reason, we recommend you use this every 10-15 minutes, just as a way of telling the server that you are still there.
  • GOODBYE [name] (optional)
    If your server is capable of gracefully shutting down, we recommend using this message to signal to the metaserver that you will no longer be reachable for searches.
  • SEARCH [id] [keyword]
    This is not a message you will send, but rather one the metaserver may send to you. If you have properly registered with the metaserver (using HELLO), and have made action in the last 15 minutes, the metaserver will send you a keyword it wants you to search for locally. The id specified here should be copied directly into the response (RESULT).
  • RESULT [name] [id] [filename] [offset]
    If you locate the keyword while searching your index, you should send a RESULT message back to the metaserver. Include your name and the id included with the SEARCH message. The result itself will be the same filename:offset we desired from MP3. Send these separated by a space rather than a colon. Each RESULT message will represent a single result, thus if you find 18 matches from some keyword, you should send 18 RESULT messages to the metaserver (similarly, no matches locally indicate that you should send no response to the metaserver).
Unlike TCP packets, UDP packets have the benefit of being received in one piece, the same piece they were sent in. So, after doing a u_recvfrom, you have the complete buffer that was sent. Again modify persister.c, this time to spawn a thread that will open a UDP port (passed in at the command-line), waiting for connections from the metaserver.

Add some code that looks like Program 20.3 in R&R. You will wait for connections with u_recvfrom, match them against the protocol above, and respond using u_sendto or u_sendtohost. You can copy the u_buf_t value from u_recvfrom in order to have the u_buf_t you need in your u_sendto call, or you can construct one yourself, using various functions from uiciname.h, as well as the command-line argument with the server's hostname. The UDP port for the server is a global #define in helper.h (METAPORT).

For this part, we are most interested in the SEARCH request from the metaserver. When you receive this, you will perform a local search of the 3rd argument, the keyword, and for each result, send a UDP packet back to the metaserver using RESULT. The id that you return in RESULT should be the same id that you was specified as the 2nd argument in SEARCH. This is necessary to let the server know where to send your result. The other fields should be obvious.

After you have written this part, you should be able to start your program, and point your web browser to our MP5 solution. If you have properly registered with the metaserver, when you issue a search to our solution, your program will be sent a SEARCH; and if you have properly completed this part, your results that you sent to the metaserver with RESULT will be displayed on a results page from our solution (with your name in the owner column).

Part 3: Searching Other Servers

If you have completed part 1 above, your web server should be able to display search results from your local index. We desire your solution to display search results from all other solutions. This means, in reference to the Figure, we need to write steps 2 and 4.

Because of how the metaserver works, when you send it a search request, it will send a search request back to you. This means, if you have properly completed part 2 above, the results received from the metaserver will include results from your own index. Furthermore, if you are having problems with part 2 above, you can alternatively issue the same HELLO message seen above, and then search the metaserver for the results of everyone else. Combining those results with your local search would give you a complete result, which is the goal of Part 3. However, once you complete part 2, you do not need to perform any local searches except in the case a SEARCH message was received from the metaserver.

So in extension to the messages above, you can use the following functions to search the network:
  • METASEARCH [name] [keyword]
    Send, along with your name, the keyword you wish to search the network for. The metaserver will send a SEARCH message to other valid servers, and then send you results. If you improperly specify your name here, you will never get your results. Similarly, if you never registered with the HELLO message, the server will not know how to contact you.
  • METARESULT [keyword] [filename] [offset] [name]
    This is a result from the metaserver. It may be a result from your local index (if part 2 works), or a result from another solution. The keyword represents the same one you searched for (so you can match it up locally if necessary). The filename and offset are the same as would have been returned for MP3. The name field of the message corresponds to the owner of the server that returned that result. You should display all of these values in your result (see demo).
Simply change the web server portion of your solution from searching the local index to sending a METASEARCH message to the metaserver. Now, you can just wait for the results, and as you get them, put them in a global structure or buffer of some sort.

How do we know when we've received all responses for a particular search? We don't. Really, we don't even know if the metaserver ever received our HELLO message, let alone our METASEARCH request. This is UDP. This is how it is. We have not designed this MP to be fault-tolerant or reliable in any sense. You can examine those topics in CS 438 - Communication Networks.

Because of this, we suggest waiting some small chunk of time (say one quarter of one second, 0.25s), for responses from the metaserver. After you've waited that length of time, return the results received thus far to the web browser using sendHTTPReply as you did in Part 1.


So there you have it. Your solution should begin by indexing the specified directory, and spawning two threads, one for web requests, and one for UDP messages. Your solution needs to register with the metaserver using HELLO, and then properly accept web and UDP requests.
Given Code
and Demo
helper.h, helper.c: A collection of helper functions that will aide in completion of this MP. The same processString from before is included and should be used in the same manner it was used in MP3 (to process the search string before calling strtok_r repeatedly on it). Also in here are the getHTTPRequest and sendHTTPReply methods discussed above.
index.h, index.c: The index portion of one of the TAs solution to MP3. The serialization routines have been removed as they are no longer needed.
persister.c: The indexer/searcher half of MP3, combined into one program. Unchanged, with the other files here, this will compile (gcc -o persister persister.c helper.c index.c); and when run, indexes the specified directory, then searches it for three words: include, void, and red (displaying the results to stdout). You will want to start with this code, understand it, and then modify it to work as your MP5 solution.

UICI Library: This library provides all of the functions you will need to make networking easy. You can choose to ignore them if you wish, but it will make the assignment slightly harder. We will be covering the TCP side of the library the week of 17 April in discussion, and UDP the week after.
Beej's Guide to Network Programming: A good resource for students who wish to do lower-level socket programming, or wish to know more about how the UICI functions are working.

Demo is no longer hosted
Deliverable This assignment is expected to be completed via pair programming. You are encouraged to form a group of 2, although you may elect to work alone. Newsgroup communication about general issues is allowed. Please post any questions on the given code on the newsgroup. You should use the CSIL Linux machines to do the assignment.

You are strictly required to use C. For all parts, modify the given persister.c, adding, if necessary, any other functions or files you need to complete this assignment. Create a makefile which compiles an executable, persister, taking the same command-line arguments as the given code.
Create a README file as well, detailing any problems encountered during program design or execution. Include in the README the following information:
  1. MP number and title
  2. Your names and netids
  3. How you split the work for the entire MP
  4. Which parts of the assignment you completed
Handin Create an archive containing the files used in this assignment (makefile, persister.c, index.c, index.h, helper.c, helper.h) along with any other files you added. Your solution should make and compile without the need to add more files to your submission. The archive should be submitted to Compass under the "MP5 Handin" assignment. This must be done by 4pm on Monday, 1 May. Every hour late is minus 2% credit. No submissions accepted after 4pm on Tuesday, 2 May.
Grading Criteria
  • Part 1 - 25%
  • Part 2 - 35%
  • Part 3 - 35%
  • README and Makefile - 5%
After Part 3 has been written, Part 1 will be graded only to see that web functionality is enabled.
Clarifications
  1. To aide in design, and to make the metaserver more stable, as well as further specify the protocol, we have placed limitations on the variables within the messages. Name can be no more than 39 characters, any more will be ignored; keywords can be no more than 39 characters, any more will be ignored; filenames can be no more than 79 characters, any more will be ignored. The id and offset values can be assumed to be integers. Similarly, you should send a value within the range of 0-65535 for the queryport in the HELLO message. This should also be useful if you wish to use automatic arrays to be the buffers for your messages. (ie this makes max message size somewhere around 200 bytes)
  2. Currently the metaserver is not running.
  3. A sample exchange between your MP5 server and the metaserver would go as shown below (metaserver displays in bold). Each line is a separate UDP packet.
    HELLO abennet1-test1 6708
    KEEPALIVE abennet1-test1
    METASEARCH abennet1-test1 include
    SEARCH 14 include
    RESULT abennet1-test1 14 dirA/dirAA/file5.TXT 1716
    METARESULT include dirA/dirAA/file5.txt 1716 abennet1-test1
    RESULT abennet1-test1 14 dirA/file4.TXT 4380
    METARESULT include dirA/file4.TXT 4380 abennet1-test
    RESULT abennet1-test1 14 dirB/dirBA/dirBAA/file9.TXT 28486
    METARESULT include dirB/dirBA/dirBAA/file9.TXT 28486 abennet1-test1

    At some time later:
    GOODBYE abennet1-test1
  4. The req field of struct Request is not a value you should ever examine. Your program should not take any action dependent on this value, nor is this value useful for determining if the incoming HTTP request is a search or not. This field, however, must be passed as the second parameter to sendHTTPReply.
  5. METAPORT is the port the metaserver will be listening on. If you wish to receive datagrams from the metaserver, you will need to listen on a port (probably one that is not equal to METAPORT), and let the metaserver know you are listening there. One of the command-line arguments represents the query port; you should bind a UDP socket to this port in order to accept incoming connections. Using u_openudp(0) will not bind to a port.
  6. After calling u_recvfrom, the remote parameter will be set to point to the server. If you turn around and do a u_sendto with that same parameter, you will send the message to the metaserver. HOWEVER, the metaserver does not do this. If you open a new port to send a specific datagram, you will still be sent a response to the query port you specified in your HELLO datagram.