Project 2, Phases 1 & 2: Reliable Transport
Phase 2 Due: Friday, Feb. 28 @ 11:59PM (tentative).
Introduction
Your task is to implement a reliable, stop and wait (Phase 1) and sliding window (Phase 2) transport layer on top of the user datagram protocol (UDP). You will use IP addresses and UDP port numbers to demultiplex traffic, but not otherwise rely on UDP--in particular you should not rely on UDP's checksum to detect bit errors in packets.
The assignment is split into two phases. In the first phase, you just need to support a single direct connection between two UDP ports, one on the server and the other on the client; you can use the stop and wait protocol for this. The picture below shows how the system should look after Phase 1 is implemented:

In Phase 2, you extend the functionality to support demultiplexing of several connections at the server as well as to support a sliding window with sizes > 1. (Recall that stop and wait is equivalent to sliding window with a window size of 1). The picture below shows how the system should look after Phase 2 is implemented:

In this assignment, you are provided with a library (rlib.h and rlib.c) and you have to implement some functions and data structures for which skeletons are provided (in reliable.c). You will probably find it useful to look through rlib.h and rlib.c, as several useful helper functions have been provided, including abstracting many of the input and output differences between the above two pictures and the I/O details so that you can focus on the learning goals of the reliable transport protocols.
In general your implementation should:
- Handle packet drops.
- Handle packet corruption.
- Provide trivial flow control.
- Provide a stream abstraction.
- Allow multiple packets to be outstanding at any time (using a limit given to your program as a run-time parameter, via the -w option).
- Handle packet reordering.
- Detect any single-bit errors in packets.
You will implement both the client and server component of a transport layer, both sides available in the same executable, and selected via command line options. The client will read a stream of data in (either from STDIN, in the first phase, or from a reliable TCP connection in Phase 2), break it into fixed-sized packets suitable for UDP transport, prepend a control header to the data, and transmit this packet to the server. The server will read these packets and write the corresponding data, in order, to a reliable stream (STDOUT in Phase 1, and a TCP connection in Phase 2).
One of the simplifications of this assignment over what we study in TCP is that we are implementing a packet-oriented transport, as opposed to a byte stream transport. With not having to keep track of byte offsets, our sequence numbers and acknowledgements refer to
Packet types and fields
There are two kinds of packets, Data packets and Ack-only packets. You can tell the type of a packet by its length. Ack-only packets are 8 bytes, while Data packets vary from 12 to 512 bytes. The packet format is defined in rlib.h:
struct packet {
uint16_t cksum; /* Ack and Data */
uint16_t len; /* Ack and Data */
uint32_t ackno; /* Ack and Data */
uint32_t seqno; /* Data only */
char data[500]; /* Data only; Not always 500 bytes, can be less */
};
typedef struct packet packet_t;
Every Data packet contains a 32-bit sequence number as well as 0 or more bytes of payload. The len, seqno, and ackno fields are always in network byte order (meaning you will have to use htonl/htons to write those fields and ntohl/ntohs to read them). Both Data and Ack packets contain the following fields:
cksum16-bit IP checksum (you can set the cksum field to 0 and use the
cksum(const void *, int)function (i.e. pass it a pointer/address of a packet and a length) to compute the value of the checksum that should be in there). Note that you shouldn't have to call htons on the checksum value produced by thecksumfunction--it is already in network byte order.len16-bit total length of the packet. This value will be 8 for Ack packets, and 12 + payload-size for data packets (since 12 bytes are used for the header). An end-of-file condition is transmitted to the other side of a connection by a data packet containing 0 bytes of payload, and hence a final len of 12. Note: You must examine the length field, and should not assume that the UDP packet you receive is the correct length. The network might truncate or pad packets, resulting in a difference between the UDP length and the length maintained by you in this field.
ackno32-bit cumulative acknowledgment number. This says that the sender of a packet has received all packets with sequence numbers earlier than ackno, and is waiting for the packet with a seqno of ackno. Note that the ackno is the sequence number you are waiting for, that you have not received yet. The first sequence number in any connection is 1, so if you have not received any packets yet, you should set the ackno field to 1.
The following fields only exist in a data packet:
seqnoEach packet transmitted in a stream of data must be numbered with a seqno. The first packet in a stream has seqno 1. Note that in TCP, sequence numbers indicate bytes. By contrast, this protocol just numbers packets. That means that once a packet is transmitted, it cannot be merged with another packet for retransmission. This should simplify your implementation.
dataContains (len - 12) bytes of payload data for the application.
To conserve packets, a sender should not send more than one unacknowledged Data frame with less than the maximum number of bytes, 500. (This behavior is somewhat akin to TCP's Nagle algorithm, which we will discuss in lecture.) Note that this is not the same as limiting the window size in general; this applies only when a "small" packet is sent and not yet acknowledged.
Requirements
Your transport layer must support the following:
Each side's output should be identical to the other side's input, regardless of a lossy, congested, or corrupting network layer. You will ensure reliable transport by having the recipient acknowledge packets received from the sender; the sender will detect missing acknowledgments and resend the dropped or corrupted packets.
Your server side should handle multiple client connections through demultiplexing in phase 2.
As reliable transport is inherently a stateful protocol, where it needs to make decisions on what to do based on past protocol interactions between a communicating pair, your transport layer should handle simple connection establishment. In this assignment, the server can detect a new connection when it receives a packet with a sequence number of 1 (which should always be the sequence number of the first packet in a new connection).
You should handle connection teardown properly. When you read an EOF on the sender/client side of a connection, you should send a zero-length payload (12-byte packet) to the other side to indicate the end of file condition. When, at the receiver/server, you receive a zero-length payload (and have written/passed along the contents of all previous packets), you should convey an EOF to your output sink by calling
conn_outputwith alenof 0.For Phase 1, you would have the window size to be just one packet (the default). For Phase 2, you have to support larger window sizes. The window size is supplied by the
command-line option, which will show up as the-wwindowfield in theconfig_commondata structure passed to therel_createandrel_demuxfunctions you implement. See the provided rlib.h file for this struct definition.Your server and client should ensure that data is written in the correct order, even if the network layer reordered packets. Your receiver should buffer as many packets as the client may send concurrently. In other words, the sender window size (SWS) should equal the receiver window size (RWS), and both should be the same as the
windowfield in theconfig_commonstructure.The sender should resend a packet if the receiver does not acknowledge it within an appropriate time period. You need not implement any backoff like TCP, and can instead merely send packet(s) whenever a sent packet has gone unacknowledged for the timeout period. The timeout period in milliseconds is supplied to you by the
timeoutfield of theconfig_commonstructure. The default is 2000 msec, but you may change this with the-tcommand-line option.Again, acknowledgements should be cumulative rather than selective. Remember that like TCP, you acknowledge the next sequence number you are expecting to receive, which is 1 more than the largest in-order sequence number you have received. You don't have to handle sequence number overflowing and wrapping in the lifetime of a connection.
You can retry packets infinitely many times, and should make sure you retry at least FIVE times, after which, if you want, the transport endpoint can terminate the connection with an error. You should not leak memory on terminated connections, so take care to clean up allocated structures for any such connections. You can call
rel_destroyto destroy the state associated with a connection when you give up on retransmitting.-
Note 1: Hopefully, it should be clear that the above specification is not really GoBackN nor SelectiveRepeat from your textbook. It is something of a hybrid between some of the transport choices made by TCP and the SR protocols.
Note 2: For debugging printfs you should use the Standard Error
fprintf (stderr, ...)and not print on standard output. This is because standard output is being used for the actual program output and it will be confusing for the grader as well as the tester.
Implementation Details
There are two modes of operation of the reliable transport protocol:
The first mode is single-connection mode, and connects, through your reliable transport protocol implementation, the standard input and output of the two endpoint processes together. The second is multi-connection mode, in which one endpoint process accepts TCP (or unix-domain) socket connections and relays them over your reliable transport protocol to a server that, for each demux endpoint, connects to a TCP port or unix-domain socket.
You are provided with a library (rlib.h/rlib.c) and your task is to implement the following seven functions: rel_create, rel_destroy, rel_recvpkt, rel_demux (Phase 2), rel_read, rel_output, rel_timer:
rel_create: The
reliable_statestructure, defined in reliable.c is intended to encapsulate the state of each connection. The structure is typedefed torel_tin rel.h, but the contents of the structure is defined in reliable.c, where you should add more fields as needed to keep your per-connection state. Arel_tis created by thisrel_createfunction. When running in single-connection or client mode, the rlib library will callrel_createdirectly for you. When running as a server, you will need to invokerel_createyourself from withinrel_demuxwhen you notice a new connection, which will show up as a packet with sequence number 1 from asockaddr_storagethat you have not seen before (you can test for whether you have seen a connection before by usingaddreq(const struct sockaddr_storage *, const struct sockaddr_storage *)to compare a packet's source address to addresses you have seen before).rel_destroy: A
rel_tis deallocated byrel_destroy(). The rlib library will callrel_destroywhen it receives and detects an ICMP port unreachable (signifying that the other end of the connection has died). You should also callrel_destroywhen all of the following hold:- You have read an EOF from the other side (i.e., a Data packet of len 12, where the payload field is 0 bytes).
- You have read an EOF or error from your input (
conn_inputreturned -1). - All packets you have sent have been acknowledged.
- You have written all output data with
conn_output.
Note that to be correct, at least one side should also wait around for twice the maximum segment lifetime in case the last ack it sent got lost, the way TCP uses the FIN_WAIT state, but this is not required.
rel_recvpkt and rel_demux: When a packet is received, the rlib library will call either
rel_recvpktorrel_demux.rel_recvpktis called when running in single-connection or client mode. In that case, the library already knows whatrel_tto use for the particular UDP port receiving the packet, and supplies you with therel_t. In the case of the server when not in single-connection mode, all UDP packets go to the same port, so you must demultiplex the connections in the rlib library called rel_demux, and make the appropriate calls to rel_recvpkt.rel_read: To obtain from the "application" the data that you must transmit to the receiver, call
conn_input.conn_inputreads from standard input when running in single-connection mode, and from a TCP connection when running in client or server mode (thus abstracting away which mode you are in from the protocol implementation). If no data is available,conn_inputwill return 0. At that point, the rlib library will callrel_readonce data is again available again, so that you can once again callconn_input. Do not loop callingconn_inputif it returns 0; simply return and wait for the rlib library to invokerel_read!rel_output: To deliver output data to the "application" that you have received in decoded UDP packets, call
conn_output. This function outputs data either to STDOUT or to a TCP connection, depending on the mode of operation. You may find the functionconn_bufspaceuseful--it tells you how much space is available for use byconn_output. If you try to write more than this,conn_outputmay return that it has accepted fewer bytes than you gave it. You must flow-control the sender by not acknowledging packets if there is no buffer space available forconn_output. The rlib library callsrel_outputwhen output has drained (i.e. been consumed by the "application"), at which point you can callconn_bufspaceto see how much buffer space you have and send out more Acks to get more data from the remote side.rel_timer: The function rel_timer is called periodically by the rlib library, currently at a rate 1/5 of the retransmission interval. You can use this timer to inspect packets and retransmit packets that have not been acknowledged. Do not retransmit every packet every time the timer is fired! You must keep track of which packets need to be retransmitted when.
Phase 1
While you could develop this project on any of the Linux machines in Olin 219, we want to plan for testability. Since your endpoints for communication run over UDP, we want to be able to control the link (and in particular, its latency, bandwidth, and loss characteristics) over which the UDP packets are transmitted. Your mininet virtual machine, and the virtual network topology it provides, are a perfect match for this requirement. In addition, since you have root access inside the virtual machine, we can add the dmalloc facility, helpful for debugging memory management through malloc().
We will start with a discussion of how to run your built program (or the provided reference program) and then, in the Getting Started section, discuss getting the software base and making sure you have installed some of the packages for effective development. The discussion from here assumes you have downloaded the base software through Mercurial, made the modifications to the file reliable.c for the implementation of the above set of functions possibly adding other .h and .c files for closely related sets of functions, and built the executable using make, resulting in the executable file reliable.
When you are done with Phase 1, two instances of reliable should be able to communicate with one another. An example of the working program is given here.
On your mininet virtual machine, start up mininet:
mininet@mininet-vm:~/test/reliable$ sudo mn -x
This starts up mininet with the default topology and launches an xterm for each of the two hosts, h1 and h2 (as well as for the switch and the controller). This should be done on a terminal where you have ssh'd into the mininit vm with the -X option to ssh to enable "remote" GUI display.
In the host1 xterm, run:
root@mininet-vm:~/test/reliable# ./reliable 6666 10.0.0.2:5555 [listening on UDP port 6666] Hello I am typing this on host1.
In the host2 xterm, run:
root@mininet-vm:~/test/reliable# ./reliable 5555 10.0.0.1:6666 [listening on UDP port 5555] Hello I am typing this on host2.
Now anything typed on host1 will show up on host2 and vice versa. Note that you can use the provided reference executable on either (or both) ends of this communcation to test and help debug your implementation. The reference is an x86_64 linux binary suitable for execution on the mininet architecture.
For debugging purposes, you may also find it useful to run ./reliable with the -d command-line option. This option will print all the packets your implementation sends and receives.
Phase 2
For Phase 2, you will extend your solution to Phase 1 to support two additional features:
- A sliding send and receive window larger than one packet, and
- Connection demultiplexing.
The first feature is relatively straight-forward. When you run the reliable program with the -w argument, it should set the sender and receiver window sizes to be whatever the supplied argument is. For example, the following command should select a window size of 5:
root@mininet-vm:~/test/reliable# ./reliable -w 5 1111 10.0.0.2:2222 [listening on UDP port 1111]
The value specified for the -w argument is stored in the window field of the config_common data structure. You should access it as cc->window in the rel_create function, and store the value somewhere in the reliable_state structure so you have access to it in other functions.
Connection demultiplexing is used when running the reliable program in server mode, which is selected by the -s switch. For example, the following command runs reliable in server mode:
root@mininet-vm:~/test/reliable# ./reliable -s -w 5 1111 10.0.0.1:2222 [listening on UDP port 1111]
Unlike single-connection mode, which you've been using up until this point, in server mode the argument 10.0.0.1:2222 specifies a TCP, rather than UDP port. At this point reliable may accept multiple connections from different clients on different client UDP ports, all sending packets to port 1111 on the server. The reliable program will get all of these packets, but since they are all destined to the same UDP port, the rlib code doesn't know which connection they belong to. Therefore, received packets will be passed to the function rel_demux.
In server mode, the library never calls rel_recvpkt. Instead, you must look up the rel_t structure for a packet based on the client's UDP sockaddr_storage. You will find the addreq function that compares two sockaddr_storage structures for equality useful here.
In server mode, reliable input and output no longer come from standard input and output. Instead, for each new connection set up, the library creates a TCP connection to the TCP port specified (10.0.0.1:2222 in the example above). There is a utility uc that came with the distribution that allows you to listen to a particular TCP port, so that you can test your library. Just run, e.g., ./uc -l 2222 to listen for one connection on a particular TCP port. (You'll have to run it again in a different terminal if you want to accept more than one connection.)
There is also a client mode, selected by -c. You shouldn't need any special support in your software for client mode, as long as you are using the rel_t structure correctly. Client mode allows you to accept TCP connections and relay them to a reliable server on a particular UDP port. For instance:
root@mininet-vm:~/test/reliable# ./reliable -c -w 5 3333 10.0.0.1:1111 [listening on UDP port 3333]
The above command accept connections on TCP port 3333, and for each connection, allocates a new UDP port and uses that port to talk to a reliable server listening on port 1111. The uc command without the -l flag allows you to connect to a TCP port. For instance, to test the above, run ./uc localhost 3333.
Getting Started
To get started, you will want to launch your mininet virtual machine and then use it to access the Mercurial repository with your code base.
The best way to download the assignment source code is to use Mercurial to clone your user-specific repository, by executing the following command:
hg clone http://140.141.132.4:8001/cs375-login
where login is replaced by your Olin/MathCS login id. Mercurial (hg) is a powerful version control system that will make it easy for you to checkpoint your work and later browse your history to track down problems if you have introduced a bug. Using hg will also make it easy for you to update your source tree should I need to make corrections/bug fixes to the rlib infrastructure of the project assignment. While use of hg is not required for this class, if you invest the time to learn hg now, you will likely benefit far into the future. A quick search of the web will result in a number of high quality tutorials, or you are always welcome and encouraged to talk with me about it..
If an update to the assignment is required, I will lead you through how to merge the changes in the base with the modifications you have made in pursuing your assignment.
Conceptual Questions
Here are some conceptual questions which may help you better understand the assignment and how to go about implementing it, as well as its relationship to TCP. You don't need to answer these questions in your submission: they are purely for your benefit.
One of the requirements of the assignment is to implement basic flow control (i.e. packets must not be acknowledged until they are outputted via
conn_output). How wouldreliablebehave if this requirement was relaxed? Would it function incorrectly? Is flow control purely for the sake of reducing internet congestion?The
datasegment of eachpacket_tis only 500 bytes. Though perhaps trickier to implement, a variable sized field could support sending much more data (up to 65535 bytes) per UDP packet. Assess this alternate approach.Another requirement of this assignment is to ensure that there is no more than one unacknowledged data frame with less than the maximum number of bytes (akin to TCP's Nagle algorithm). Which kind applications or usage would benefit from this, and which would drastically suffer?
This assignment counts and acknowledges entire packets, while TCP counts and acknowledges at the byte level. Assess the merits of this approach versus TCP's.
List three key limitations of Reliable, and outline how, if at all, TCP addresses them.
Grading
75% of your assignment grade will be functionality/execution tests. 25% of your grade will be based on the quality and readability of your code. We understand readability can be a subjective measure, and don't want to enforce particular coding expectations. But generally speaking, we expect your code to:
- Be readable at the statement and block level, with reasonable naming conventions and consistent formatting/spacing
- Have comments that shed light on particularly complex or difficult pieces of code
- Have a principled, well-considered design and units of functionality. Repeated code, enormous functions, and a general lack of structure will make it very hard to understand what you're doing. Your README should describe this structure.
Submitting
To submit the assignment, you must do three things:
- Use Mercurial to commit your code.
- Tag your code (for each phase) so that regardless of other updates, the submitted version of each phase has a consistent tag applied to each file. In mercurial parlance, this creates a changeset, and itself needs to be pushed to the repository. If you have not created a tag before, you tag your repository by issuing the command:
hg tag project2phase1-submit
- Push your code, so that the Denison CS repository storage gets the up-to-date version.
Collaboration policy
You should direct most questions to Piazzza, but should not post source code there.
You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment and system library code. You are not allowed to show your code to anyone else in the class or look at anyone else's solution. You may discuss the assignments with other students, but do not copy each others' code.
F.A.Q.
The following is merely a working list of logistical or anticipated questions.
- Is using c99 syntax ok? How do I use c99 syntax?
Yes. To tell gcc to use c99, add
-std=c99toCFLAGSin the Makefile) - Can we assume that the UDP packet length received is correct?
No. As stated above, you must examine the length field, and should not assume that the UDP packet you receive is the correct length. The network might truncate or pad packets.
- How are
rel_output, conn_output,andconn_bufspacerelated?conn_output-- outputs data (to STDOUT or a TCP connection, depending on the mode of operation). Call this function with received data to be delivered to the application.conn_bufspace-- returns the space available for use withconn_output. Calls toconn_outputhave limited space, as there is an underlying buffer. If you callconn_outputwith more data than it can handle,conn_outputmay return that it has accepted fewer bytes. In order to avoid passing in too much data, callconn_bufspaceto find out how much space is available.rel_output-- called by the library when output has drained. Once the data passed in toconn_outputis all sent, the library will callrel_outputin order to continue processing any more data available.
- Should Acks be piggybacked on top of outgoing data packets?
Piggybacking Acks is preferable but not required.
- Say packets 1-5 are received. I output 1-3, but I do not have space to output 4 and 5 (even though I have them buffered). What should I do?
Only Ack packet 6 once you have space for 4 and 5. This helps to rate limit the sender.
- Do we retransmit each packet individually or always just retransmit all unacknowledged packets?
Either is fine for correctness, but the preferred thing is to retransmit just one packet (rather than a whole window)
- In server mode, where does the server get data that needs to be sent?
The server gets the data that it needs to send from packets that arrive in
rel_demux. - Once the server gets a data packet to send, how does it know where to send it?
The server manages multiple connections between many clients (one client per entry in the
rel_tlinked list). It sends the packet that it receives out through a call toconn_output. Use the connection from the correctrel_tstruct as the first argument forconn_output.
