The Global Internet
Credits: My thanks to Dr. Brighten Godfrey at the University of Illinois Urbana-Champaign and their CS-538, Advanced Computer Networks class. This assignment is adapted, with permission, from one of their introductory assignments.
Background
On January 25, 2011, a popular uprising began in Egypt that would ultimately bring an end to the 29-year regime of Hosni Mubarak. On January 27, 2011, attempting to inhibit the Facebook- and Twitter-organized protests, the Egyptian government shut off essentially all Internet service to the country of 82 million people -- a unique event in the history of the Internet.Your mission is to answer the question, "How long does it take to sever all global networked communications of the 15th largest country in the world?" Because of the open and decentralized nature of the Internet, you can answer this question using publicly-available Internet routing information.
This web page guides you through the process. However, the main point of this section is to get a feel for the BGP data by visualizing an interesting event in the data set. If you would like to explore an interesting aspect of the data other than what is suggested here, or even explore a completely different event, you are welcome to do so. Check with me first if you have any questions about appropriateness.
Route Views and bgpdump
The Route Views Project maintains data of routing behavior on the live Internet, and stores these traces for later analysis. Multiple years of data sets are available. To produce the data, Route Views maintains a number of collectors. Each collector has independent BGP connections to several ISPs routers. The collectors log two types of data:- periodic complete snapshots of the collector's entire routing table (Routing Information Base, or RIB), aggregating the full set of advertisements from the multiple BGP peers of the collector; and
- more continuous logs of the BGP update messages received from the neighboring routers.
In addition to the MRT format RIB and UPDATE files from Route Views, we will be making use of a utility called bgpdump that will enable us to process the RIB/UPDATE files into formatted ASCII text. It is this ASCII text output from bgpdump that we will process to answer the question posed above. I have created a simple Python program that can serve as a skeleton for the processing you need to perform.
Start by downloading and building bgpdump and its requisite library, libbgpdump. I used the following distribution: http://www.ris.ripe.net/source/bgpdump/, and downloaded and untarred the 1.4.99.13 version. My test environment was a 32 bit Ubuntu 12.04 desktop system, but I am hopeful you can successfully build on the Ubuntu 11.10 systems in Olin 219. Let me know if you have any trouble, and I can post the executable and library from my own build.
Once you have downloaded and extracted the libbgpdump code from its archive, you will configure and then make the code (after entering the directory):
- ./configure
- make
sudo apt-get install libbz2-devand then repeat the configure and make steps. This should yield the bgpdump executable and its library. This executable takes a single command-line argument specifying the input file for processing and writes its ASCII counterpart to stdout. If you want to save the ASCII output for later processing, simply redirect to a file.
Note that the input file can be either a bzipped (.bz2) MRT file, as downloaded from Route Views, or an un-bzipped file. We can also use bzcat to combine a set of .bz2 files together into a single file to use as input.
Take bgpdump and the Route Views data for a spin as follows:
- From Route Views, download the first RIB snapshot from January 27, 2011, from the London Internet Exchange (LINX) collector. Note that this will be a very large file, and should probably be stored in /tmp and cleaned up after this assignment is complete.
- Run bgpdump on the RIB snapshot. From our discussion and from the textbook, try and figure out what each field means. The ASCII version of the file will also be very large, and processing can take a number of minutes.
- Search the bgpdump output to find any RIB entry associated with Denison's IP address range. (Hint: you could do this by finding associated IP prefixes or AS numbers. You can use a whois database (e.g., http://whois.arin.net/ui/) to find the IP prefix and origin AS number associated with an IP address.) Be careful, even if you redirect to a file, the file will be too large for most editors to handle. A simple command-line utility like less can allow searching and may solve this problem.
- Can you use the above entry to determine the sequence of ISPs through which packets will flow when traveling from one of the LINX routers in London to one of our Denison machines?
What to submit: The sequence of ISPs (AS numbers and business names) from the last step.
Measuring Route Withdrawals
Next, we want to use the Route Views data to figure out how long it took Egypt to leave the Internet. To do this, we will use an imperfect but simple approach: We will create a histogram counting the number of Egyptian-related prefix withdrawals observed in a sequence of time intervals. Due to the dynamic nature of the Internet, there are continually announcements and withdrawals even under normal conditions. But by creating the withdrawal histogram, we will see the distribution of Egyptian prefix withdrawals over time, and the spike in that distribution will indicate the period corresponding to Egypt's disconnection from the Internet.You can think of the processing as proceeding in two phases, although all the processing can, in actuality, happen in a single pass through a combination of the complete RIB and a set of UPDATE files.
In the first phase, the goal is to construct the list of prefixes associated with the Autonomous Systems in Egypt. By examining the records of the RIB snapshot from early on January 27, 2011, we can, for any given prefix, determine the AS that originates that prefix. If the AS belongs to the set of Egyptian ASes, we should record/remember that prefix for later use in phase 2. Note, when collecting prefixes, multiple RIB records may be included for any given prefix, because each Route Views collector reports prefix advertisements received from multiple peer BGP routers.
In the second phase, processing proceeds for each record of the set of UPDATE files for the time period in question. Each record includes a TIME field, and, in standard histogram fashion, a data structure should keep track of counts for each interval/bin of time. If, for any record being processed, the record corresponds to one of the Egyptian-originated prefixes (determined in the first phase), the histogram count for the appropriate time interval/bin should be incremented.
In the starter code provided, the input to the program is expected to be the output of the bgpdump program. This can be provided through stdin (with redirection or with a pipe) or it can be input from a file whose name is specified with a -f option to the program. Within the program, the ASCII input is parsed into records, where each record of bgpdump is composed of lines of keys and values and is terminated with a blank line (newline). The starter code parses the set of lines making up a record into a dictionary whose keys are the keywords starting the lines, and whose value is the string completing the line. In the case of UPDATE records, and the ANNOUNCE or WITHDRAW key, the value is a list of prefixes. Be careful not to assume that all prefixes are IPv4 address format.
The rest of the code is up to you. You could continue with a single-pass approach and combine the RIB snapshot along with all the UPDATE files from the interval around the withdrawals to create a single monster input to bgpdump, and then feed all that output to the Python program. Or, you could separate the two phases and define an intermediate file format with the list of prefixes of interest. You will probably want some command line arguments to control the start time, stop time, and histogram interval width for processing.
Some suggestions for getting started:
- From Route Views, download the London Internet Exchange (LINX) collector's UPDATES data, from near the time of Egypt's departure from the net. That happened sometime between 21:00 and 23:00 UTC on January 27, 2011, so you'll want to grab all the LINX UPDATES data at least in that interval. For any given UPDATE file, you can test things out by running bgpdump on the file and saving or piping the ASCII output and using it with the provided skeleton. This will report the number of records processed.
- Modify the starter code for phase 1 processing of RIB records so that it collects the set of prefixes associated with Egyptian ASes. (Hint 1: the following Autonomous System (AS) numbers belong to Egyptian ISPs: 5536, 8452, 24835, 24863, and 36992. Hint 2: Think about what part of the BGP RIB record information you can use to figure out which advertisements were originated by Egyptian ASes.)
- Test your phase 1 by feeding it the output of bgpdump on the RIB snapshot and having the code summarize the Egyptian prefixes collected.
- Now move on to phase 2. How you proceed here depends on whether or not you separated processing into two parts or not, but the basic processing should be straightforward: for each update record, look for a match with the set of Egyptian prefixes. If matched, use the TIME field of the record to increment the appropriate histogram bin. At the end, output the histogram information.
