Project 2, Phases 1 & 2: Reliable Transport

Phase 1 Due: Friday, Feb. 21 @ 11:59PM.
Phase 2 Due: Friday, Feb. 28 @ 11:59PM (tentative).

Introduction

Your task is to implement a reliable, stop and wait (Phase 1) and sliding window (Phase 2) transport layer on top of the user datagram protocol (UDP). You will use IP addresses and UDP port numbers to demultiplex traffic, but not otherwise rely on UDP--in particular you should not rely on UDP's checksum to detect bit errors in packets.

The assignment is split into two phases. In the first phase, you just need to support a single direct connection between two UDP ports, one on the server and the other on the client; you can use the stop and wait protocol for this. The picture below shows how the system should look after Phase 1 is implemented:

In Phase 2, you extend the functionality to support demultiplexing of several connections at the server as well as to support a sliding window with sizes > 1. (Recall that stop and wait is equivalent to sliding window with a window size of 1). The picture below shows how the system should look after Phase 2 is implemented:

In this assignment, you are provided with a library (rlib.h and rlib.c) and you have to implement some functions and data structures for which skeletons are provided (in reliable.c). You will probably find it useful to look through rlib.h and rlib.c, as several useful helper functions have been provided, including abstracting many of the input and output differences between the above two pictures and the I/O details so that you can focus on the learning goals of the reliable transport protocols.

In general your implementation should:

Handle packet drops.
Handle packet corruption.
Provide trivial flow control.
Provide a stream abstraction.
Allow multiple packets to be outstanding at any time (using a limit given to your program as a run-time parameter, via the -w option).
Handle packet reordering.
Detect any single-bit errors in packets.

You will implement both the client and server component of a transport layer, both sides available in the same executable, and selected via command line options. The client will read a stream of data in (either from STDIN, in the first phase, or from a reliable TCP connection in Phase 2), break it into fixed-sized packets suitable for UDP transport, prepend a control header to the data, and transmit this packet to the server. The server will read these packets and write the corresponding data, in order, to a reliable stream (STDOUT in Phase 1, and a TCP connection in Phase 2).

One of the simplifications of this assignment over what we study in TCP is that we are implementing a packet-oriented transport, as opposed to a byte stream transport. With not having to keep track of byte offsets, our sequence numbers and acknowledgements refer to

Packet types and fields

There are two kinds of packets, Data packets and Ack-only packets. You can tell the type of a packet by its length. Ack-only packets are 8 bytes, while Data packets vary from 12 to 512 bytes. The packet format is defined in rlib.h:


        struct packet {
          uint16_t cksum; /* Ack and Data */
          uint16_t len;   /* Ack and Data */
          uint32_t ackno; /* Ack and Data */
          uint32_t seqno; /* Data only */
          char data[500]; /* Data only; Not always 500 bytes, can be less */
        };
        typedef struct packet packet_t;

Every Data packet contains a 32-bit sequence number as well as 0 or more bytes of payload. The len, seqno, and ackno fields are always in network byte order (meaning you will have to use htonl/htons to write those fields and ntohl/ntohs to read them). Both Data and Ack packets contain the following fields:

cksum: 16-bit IP checksum (you can set the cksum field to 0 and use the cksum(const void *, int) function (i.e. pass it a pointer/address of a packet and a length) to compute the value of the checksum that should be in there). Note that you shouldn't have to call htons on the checksum value produced by the cksum function--it is already in network byte order.
len: 16-bit total length of the packet. This value will be 8 for Ack packets, and 12 + payload-size for data packets (since 12 bytes are used for the header). An end-of-file condition is transmitted to the other side of a connection by a data packet containing 0 bytes of payload, and hence a final len of 12. Note: You must examine the length field, and should not assume that the UDP packet you receive is the correct length. The network might truncate or pad packets, resulting in a difference between the UDP length and the length maintained by you in this field.
ackno: 32-bit cumulative acknowledgment number. This says that the sender of a packet has received all packets with sequence numbers earlier than ackno, and is waiting for the packet with a seqno of ackno. Note that the ackno is the sequence number you are waiting for, that you have not received yet. The first sequence number in any connection is 1, so if you have not received any packets yet, you should set the ackno field to 1.

The following fields only exist in a data packet:

seqno: Each packet transmitted in a stream of data must be numbered with a seqno. The first packet in a stream has seqno 1. Note that in TCP, sequence numbers indicate bytes. By contrast, this protocol just numbers packets. That means that once a packet is transmitted, it cannot be merged with another packet for retransmission. This should simplify your implementation.
data: Contains (len - 12) bytes of payload data for the application.

To conserve packets, a sender should not send more than one unacknowledged Data frame with less than the maximum number of bytes, 500. (This behavior is somewhat akin to TCP's Nagle algorithm, which we will discuss in lecture.) Note that this is not the same as limiting the window size in general; this applies only when a "small" packet is sent and not yet acknowledged.

Requirements

Your transport layer must support the following:

Each side's output should be identical to the other side's input, regardless of a lossy, congested, or corrupting network layer. You will ensure reliable transport by having the recipient acknowledge packets received from the sender; the sender will detect missing acknowledgments and resend the dropped or corrupted packets.
Your server side should handle multiple client connections through demultiplexing in phase 2.
As reliable transport is inherently a stateful protocol, where it needs to make decisions on what to do based on past protocol interactions between a communicating pair, your transport layer should handle simple connection establishment. In this assignment, the server can detect a new connection when it receives a packet with a sequence number of 1 (which should always be the sequence number of the first packet in a new connection).
You should handle connection teardown properly. When you read an EOF on the sender/client side of a connection, you should send a zero-length payload (12-byte packet) to the other side to indicate the end of file condition. When, at the receiver/server, you receive a zero-length payload (and have written/passed along the contents of all previous packets), you should convey an EOF to your output sink by calling conn_output with a len of 0.
For Phase 1, you would have the window size to be just one packet (the default). For Phase 2, you have to support larger window sizes. The window size is supplied by the
```
-w
```
command-line option, which will show up as the window field in the config_common data structure passed to the rel_create and rel_demux functions you implement. See the provided rlib.h file for this struct definition.
Your server and client should ensure that data is written in the correct order, even if the network layer reordered packets. Your receiver should buffer as many packets as the client may send concurrently. In other words, the sender window size (SWS) should equal the receiver window size (RWS), and both should be the same as the window field in the config_common structure.
The sender should resend a packet if the receiver does not acknowledge it within an appropriate time period. You need not implement any backoff like TCP, and can instead merely send packet(s) whenever a sent packet has gone unacknowledged for the timeout period. The timeout period in milliseconds is supplied to you by the timeout field of the config_common structure. The default is 2000 msec, but you may change this with the -t command-line option.
Again, acknowledgements should be cumulative rather than selective. Remember that like TCP, you acknowledge the next sequence number you are expecting to receive, which is 1 more than the largest in-order sequence number you have received. You don't have to handle sequence number overflowing and wrapping in the lifetime of a connection.
You can retry packets infinitely many times, and should make sure you retry at least FIVE times, after which, if you want, the transport endpoint can terminate the connection with an error. You should not leak memory on terminated connections, so take care to clean up allocated structures for any such connections. You can call rel_destroy to destroy the state associated with a connection when you give up on retransmitting.
Note 1: Hopefully, it should be clear that the above specification is not really GoBackN nor SelectiveRepeat from your textbook. It is something of a hybrid between some of the transport choices made by TCP and the SR protocols.
Note 2: For debugging printfs you should use the Standard Error fprintf (stderr, ...) and not print on standard output. This is because standard output is being used for the actual program output and it will be confusing for the grader as well as the tester.

Implementation Details

There are two modes of operation of the reliable transport protocol:

The first mode is single-connection mode, and connects, through your reliable transport protocol implementation, the standard input and output of the two endpoint processes together. The second is multi-connection mode, in which one endpoint process accepts TCP (or unix-domain) socket connections and relays them over your reliable transport protocol to a server that, for each demux endpoint, connects to a TCP port or unix-domain socket.

You are provided with a library (rlib.h/rlib.c) and your task is to implement the following seven functions: rel_create, rel_destroy, rel_recvpkt, rel_demux (Phase 2), rel_read, rel_output, rel_timer:

rel_create: The reliable_state structure, defined in reliable.c is intended to encapsulate the state of each connection. The structure is typedefed to rel_t in rel.h, but the contents of the structure is defined in reliable.c, where you should add more fields as needed to keep your per-connection state. A rel_t is created by this rel_create function. When running in single-connection or client mode, the rlib library will call rel_create directly for you. When running as a server, you will need to invoke rel_create yourself from within rel_demux when you notice a new connection, which will show up as a packet with sequence number 1 from a sockaddr_storage that you have not seen before (you can test for whether you have seen a connection before by using addreq(const struct sockaddr_storage *, const struct sockaddr_storage *) to compare a packet's source address to addresses you have seen before).
rel_destroy: A rel_t is deallocated by rel_destroy(). The rlib library will call rel_destroy when it receives and detects an ICMP port unreachable (signifying that the other end of the connection has died). You should also call rel_destroy when all of the following hold:
- You have read an EOF from the other side (i.e., a Data packet of len 12, where the payload field is 0 bytes).
- You have read an EOF or error from your input (conn_input returned -1).
- All packets you have sent have been acknowledged.
- You have written all output data with conn_output.
Note that to be correct, at least one side should also wait around for twice the maximum segment lifetime in case the last ack it sent got lost, the way TCP uses the FIN_WAIT state, but this is not required.
rel_recvpkt and rel_demux: When a packet is received, the rlib library will call either rel_recvpkt or rel_demux. rel_recvpkt is called when running in single-connection or client mode. In that case, the library already knows what rel_t to use for the particular UDP port receiving the packet, and supplies you with the rel_t. In the case of the server when not in single-connection mode, all UDP packets go to the same port, so you must demultiplex the connections in the rlib library called rel_demux, and make the appropriate calls to rel_recvpkt.
rel_read: To obtain from the "application" the data that you must transmit to the receiver, call conn_input. conn_input reads from standard input when running in single-connection mode, and from a TCP connection when running in client or server mode (thus abstracting away which mode you are in from the protocol implementation). If no data is available, conn_input will return 0. At that point, the rlib library will call rel_read once data is again available again, so that you can once again call conn_input. Do not loop calling conn_input if it returns 0; simply return and wait for the rlib library to invoke rel_read!
rel_output: To deliver output data to the "application" that you have received in decoded UDP packets, call conn_output. This function outputs data either to STDOUT or to a TCP connection, depending on the mode of operation. You may find the function conn_bufspace useful--it tells you how much space is available for use by conn_output. If you try to write more than this, conn_output may return that it has accepted fewer bytes than you gave it. You must flow-control the sender by not acknowledging packets if there is no buffer space available for conn_output. The rlib library calls rel_output when output has drained (i.e. been consumed by the "application"), at which point you can call conn_bufspace to see how much buffer space you have and send out more Acks to get more data from the remote side.
rel_timer: The function rel_timer is called periodically by the rlib library, currently at a rate 1/5 of the retransmission interval. You can use this timer to inspect packets and retransmit packets that have not been acknowledged. Do not retransmit every packet every time the timer is fired! You must keep track of which packets need to be retransmitted when.

Phase 1

While you could develop this project on any of the Linux machines in Olin 219, we want to plan for testability. Since your endpoints for communication run over UDP, we want to be able to control the link (and in particular, its latency, bandwidth, and loss characteristics) over which the UDP packets are transmitted. Your mininet virtual machine, and the virtual network topology it provides, are a perfect match for this requirement. In addition, since you have root access inside the virtual machine, we can add the dmalloc facility, helpful for debugging memory management through malloc().

We will start with a discussion of how to run your built program (or the provided reference program) and then, in the Getting Started section, discuss getting the software base and making sure you have installed some of the packages for effective development. The discussion from here assumes you have downloaded the base software through Mercurial, made the modifications to the file reliable.c for the implementation of the above set of functions possibly adding other .h and .c files for closely related sets of functions, and built the executable using make, resulting in the executable file reliable.

When you are done with Phase 1, two instances of reliable should be able to communicate with one another. An example of the working program is given here.

On your mininet virtual machine, start up mininet:

mininet@mininet-vm:~/test/reliable$ sudo mn -x

This starts up mininet with the default topology and launches an xterm for each of the two hosts, h1 and h2 (as well as for the switch and the controller). This should be done on a terminal where you have ssh'd into the mininit vm with the -X option to ssh to enable "remote" GUI display.

In the host1 xterm, run:

root@mininet-vm:~/test/reliable# ./reliable 6666 10.0.0.2:5555
[listening on UDP port 6666]
Hello I am typing this on host1.

In the host2 xterm, run:

root@mininet-vm:~/test/reliable# ./reliable 5555 10.0.0.1:6666
[listening on UDP port 5555]
Hello I am typing this on host2.

Now anything typed on host1 will show up on host2 and vice versa. Note that you can use the provided reference executable on either (or both) ends of this communcation to test and help debug your implementation. The reference is an x86_64 linux binary suitable for execution on the mininet architecture.

For debugging purposes, you may also find it useful to run ./reliable with the -d command-line option. This option will print all the packets your implementation sends and receives.

Phase 2

For Phase 2, you will extend your solution to Phase 1 to support two additional features:

A sliding send and receive window larger than one packet, and
Connection demultiplexing.

The first feature is relatively straight-forward. When you run the reliable program with the -w argument, it should set the sender and receiver window sizes to be whatever the supplied argument is. For example, the following command should select a window size of 5:

root@mininet-vm:~/test/reliable# ./reliable -w 5 1111 10.0.0.2:2222
[listening on UDP port 1111]

The value specified for the -w argument is stored in the window field of the config_common data structure. You should access it as cc->window in the rel_create function, and store the value somewhere in the reliable_state structure so you have access to it in other functions.

Connection demultiplexing is used when running the reliable program in server mode, which is selected by the -s switch. For example, the following command runs reliable in server mode:

root@mininet-vm:~/test/reliable# ./reliable -s -w 5 1111 10.0.0.1:2222
[listening on UDP port 1111]

Unlike single-connection mode, which you've been using up until this point, in server mode the argument 10.0.0.1:2222 specifies a TCP, rather than UDP port. At this point reliable may accept multiple connections from different clients on different client UDP ports, all sending packets to port 1111 on the server. The reliable program will get all of these packets, but since they are all destined to the same UDP port, the rlib code doesn't know which connection they belong to. Therefore, received packets will be passed to the function rel_demux.

In server mode, the library never calls rel_recvpkt. Instead, you must look up the rel_t structure for a packet based on the client's UDP sockaddr_storage. You will find the addreq function that compares two sockaddr_storage structures for equality useful here.

In server mode, reliable input and output no longer come from standard input and output. Instead, for each new connection set up, the library creates a TCP connection to the TCP port specified (10.0.0.1:2222 in the example above). There is a utility uc that came with the distribution that allows you to listen to a particular TCP port, so that you can test your library. Just run, e.g., ./uc -l 2222 to listen for one connection on a particular TCP port. (You'll have to run it again in a different terminal if you want to accept more than one connection.)

There is also a client mode, selected by -c. You shouldn't need any special support in your software for client mode, as long as you are using the rel_t structure correctly. Client mode allows you to accept TCP connections and relay them to a reliable server on a particular UDP port. For instance:

root@mininet-vm:~/test/reliable# ./reliable -c -w 5 3333 10.0.0.1:1111
[listening on UDP port 3333]

The above command accept connections on TCP port 3333, and for each connection, allocates a new UDP port and uses that port to talk to a reliable server listening on port 1111. The uc command without the -l flag allows you to connect to a TCP port. For instance, to test the above, run ./uc localhost 3333.

Getting Started

To get started, you will want to launch your mininet virtual machine and then use it to access the Mercurial repository with your code base.

The best way to download the assignment source code is to use Mercurial to clone your user-specific repository, by executing the following command:

    hg clone http://140.141.132.4:8001/cs375-login

where login is replaced by your Olin/MathCS login id. Mercurial (hg) is a powerful version control system that will make it easy for you to checkpoint your work and later browse your history to track down problems if you have introduced a bug. Using hg will also make it easy for you to update your source tree should I need to make corrections/bug fixes to the rlib infrastructure of the project assignment. While use of hg is not required for this class, if you invest the time to learn hg now, you will likely benefit far into the future. A quick search of the web will result in a number of high quality tutorials, or you are always welcome and encouraged to talk with me about it..

If an update to the assignment is required, I will lead you through how to merge the changes in the base with the modifications you have made in pursuing your assignment.

Conceptual Questions

Here are some conceptual questions which may help you better understand the assignment and how to go about implementing it, as well as its relationship to TCP. You don't need to answer these questions in your submission: they are purely for your benefit.

One of the requirements of the assignment is to implement basic flow control (i.e. packets must not be acknowledged until they are outputted via conn_output). How would reliable behave if this requirement was relaxed? Would it function incorrectly? Is flow control purely for the sake of reducing internet congestion?
The data segment of each packet_t is only 500 bytes. Though perhaps trickier to implement, a variable sized field could support sending much more data (up to 65535 bytes) per UDP packet. Assess this alternate approach.
Another requirement of this assignment is to ensure that there is no more than one unacknowledged data frame with less than the maximum number of bytes (akin to TCP's Nagle algorithm). Which kind applications or usage would benefit from this, and which would drastically suffer?
This assignment counts and acknowledges entire packets, while TCP counts and acknowledges at the byte level. Assess the merits of this approach versus TCP's.
List three key limitations of Reliable, and outline how, if at all, TCP addresses them.

Grading

75% of your assignment grade will be functionality/execution tests. 25% of your grade will be based on the quality and readability of your code. We understand readability can be a subjective measure, and don't want to enforce particular coding expectations. But generally speaking, we expect your code to:

Be readable at the statement and block level, with reasonable naming conventions and consistent formatting/spacing
Have comments that shed light on particularly complex or difficult pieces of code
Have a principled, well-considered design and units of functionality. Repeated code, enormous functions, and a general lack of structure will make it very hard to understand what you're doing. Your README should describe this structure.

A programming language is a language which needs to not only communicate behavior to the computer, but also to a human reader. Writing elegant, easy-to-understand code is a critical skill that we want to continue to develop.

Submitting

To submit the assignment, you must do three things:

Use Mercurial to commit your code.
Tag your code (for each phase) so that regardless of other updates, the submitted version of each phase has a consistent tag applied to each file. In mercurial parlance, this creates a changeset, and itself needs to be pushed to the repository. If you have not created a tag before, you tag your repository by issuing the command:
```
hg tag project2phase1-submit
```
Push your code, so that the Denison CS repository storage gets the up-to-date version.

Since Dr. B is also an owner of all of the individual repositories, he will be able to pull and update his working repository with your changes to test your code.

Collaboration policy

You should direct most questions to Piazzza, but should not post source code there.

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment and system library code. You are not allowed to show your code to anyone else in the class or look at anyone else's solution. You may discuss the assignments with other students, but do not copy each others' code.

F.A.Q.

The following is merely a working list of logistical or anticipated questions.

Is using c99 syntax ok? How do I use c99 syntax?
Yes. To tell gcc to use c99, add -std=c99 to CFLAGS in the Makefile)
Can we assume that the UDP packet length received is correct?
No. As stated above, you must examine the length field, and should not assume that the UDP packet you receive is the correct length. The network might truncate or pad packets.
How are rel_output, conn_output, and conn_bufspace related?
- conn_output -- outputs data (to STDOUT or a TCP connection, depending on the mode of operation). Call this function with received data to be delivered to the application.
- conn_bufspace -- returns the space available for use with conn_output. Calls to conn_output have limited space, as there is an underlying buffer. If you call conn_output with more data than it can handle, conn_output may return that it has accepted fewer bytes. In order to avoid passing in too much data, call conn_bufspace to find out how much space is available.
- rel_output -- called by the library when output has drained. Once the data passed in to conn_output is all sent, the library will call rel_output in order to continue processing any more data available.
Should Acks be piggybacked on top of outgoing data packets?
Piggybacking Acks is preferable but not required.
Say packets 1-5 are received. I output 1-3, but I do not have space to output 4 and 5 (even though I have them buffered). What should I do?
Only Ack packet 6 once you have space for 4 and 5. This helps to rate limit the sender.
Do we retransmit each packet individually or always just retransmit all unacknowledged packets?
Either is fine for correctness, but the preferred thing is to retransmit just one packet (rather than a whole window)
In server mode, where does the server get data that needs to be sent?
The server gets the data that it needs to send from packets that arrive in rel_demux.
Once the server gets a data packet to send, how does it know where to send it?
The server manages multiple connections between many clients (one client per entry in the rel_t linked list). It sends the packet that it receives out through a call to conn_output. Use the connection from the correct rel_t struct as the first argument for conn_output.

Computer Science 375 Computer Networks