Capturing Web Traffic
Table of Content
- Introduction
- Experiment Environment
- Experiment 1. Creating Two Linux Systems on Virtual Machines
- Experiment 2. Running Python Web Server and Client
- Experiment 3. Capturing Web Traffic
- Experiment 4. Writing Python Program to Capture Web Traffic
Introduction
This experiment is via hands-on experience to help achieve this module’s learning objectives. Upon completing the experiment and a few related in future lessons, we should be able to
- to capture and analyze TCP/IP network packets,
- to understand the overall architecture of the Internet,
- to have an overview of the operations of TCP/IP networks,
- to describe the TCP/IP architecture and explain the functioning of each layer, and
- to understand the operations of the protocols of the Internet
Experiment Environment
The experiment environment consists of a host system, Linux systems on virtual machines, X server on the host, and a secure shell client on the host. We shall introduce these in the following.
Oracle VM VirtualBox
Virtualisation software allows us to run multiple virtual machines on a host computer, and with these virtual machines we can build and experiment computer networks on the host system. You will download two pieces of software from Oracle VM VirtualBox.
- Download a VirtualBox platform package that matches your operating system from https://www.virtualbox.org/wiki/Downloads and install it.
- Download the VirtualBox Extension Pack from the same Web page and install it.
Linux System
The instructor prepared a prebuilt Oracle VM VirtualBox Linux system image with necessary packages installed. This prbuilt Linux system image does not have any GUI desktop software (such as, Gnome or KDE) installed, as such, requires barely minimum memory (~190 MB) to run, is thus suitable to run multiple instances of the system on virtual machines on an ordinary desktop or laptop computer that supports virtualisation. Download the image from
In the later Section of Creating Two Linux Systems, we will use this image to create two or more Linux systems on virtual machines. The image is based on an old Debian Linux. In order to install software packages, you should run the following command when the system is up:
sudo apt-get update --allow-releaseinfo-change
X Server
The prebuilt Linux system does not have any GUI desktop software installed. To display graphics, we need to install an X server on the host system.
Windows
There are a few free X server software for Windows available. The instructor recommends vcXsrv, a X server based on the X.org Foundation’s source code and built using Microsoft’s Visual Studio (Visual C++). Download it from
To launch the X server, use the XLaunch shortcut created during the installation, and the default settings are appropriate for us.
Mac OS X
Apple created the XQuartz project and relies on a community effort to further develop and support X11 on Mac. If you are using a Mac OS X system as the host computer, download and install
Linux
Linux comes with X server if you have GUI desktop software like Gnome or KDE installed. If you have already had one of these running, you don’t need to install additional X server software for this exercise.
Secure Shell Client
We use a Secure Shell client to access the Linux systems on the virtual machines. We have at least three advantages to do so.
- We can easily have multiple terminals to a Linux system.
- We can conveniently copy and paste texts among the Linux terminals and the host computer
- Last, but not the least, we can use the Secure Shell client’s X11 Forwarding feature to display the Linux systems’ graphics on the display availed by the X server on the host computer.
On Windows, many people choose PuTTY, a small and elegant secure shell client software that has continuously gotten updates over the years. If you don’t have an up-to-date secure shell client on your Windows system, you should download and install it.
Microsoft has begun to ship the OpenSSH client as an optional feature on its Windows 10 operating systems since December 2017. If you are using a latest release of a Windows 10 or Windows 11 system, to verify whether you have already had it installed, or to determine if it is available in the particular release of your Windows system, or to choose to install it, follow the navigation path on the user interface, Settings, Apps, and Optional Features. If the user interface does not list it, select Add a feature.
On Unix systems like OS X, Linux, and Solaris, Secure Shell clients are part of the system. If you are using one of those for this class, you have already gotten it.
Experiment 1. Creating Two Linux Systems on Virtual Machines
We now create two Linux systems.
- Extract the downloaded Linux system virtual machine image.
- Open VirtualBox, and via menu
Machine | Add
to create and add the first Linux system - Use VirtualBox to create a linked clone of the virtual machine of the
first Linux system. When cloning the virtual machine, choose
Machine | Clone
from the menu, select to generate new MAC addresses for network adapters and the linked clone type. - To differentiate easily these Linux systems, we should change the host
name of the newly created system to something different. To do this,
we edit two files on the newly created Linux system (the linked clone),
/etc/hosts
and/etc/hostname
. For instance, we replacebrooklyn
in/etc/hosts
bymidwood
, and replacebrooklyn
in/etc/hostname
bymidwood
, reboot the system, and you should observe that the hostname displayed on the command prompt changes tomidwood
.
For convenience, we call these two Linux systems running on virtual machines
brooklyn
and midwood
.
Exercise 1. Exercises and Explorations
Let’s try the following,
- Can you show the network interface cards (NICs) and their
configurations on the Linux systems? (hint: using
ip address show
) - How many NICs are there? What are their link addresses, and what are their IP addresses?
- How do those fit in the diagram of the layered protocol architecture that we discussed in class?
- Can you open multiple terminal to a single Linux system using a Secure Shell (SSH) client? How does the SSH client fit in the diagram of the layered protocol architecture? (hint: we shouldn’t confuse the SSH client with the SSH protocol, an application layer protocol)?
- Upon successfully opening a terminal via the SSH client to the Linux system, can you picture how the data (like your keystrokes, the output of a command you run) would flow in the diagram of the layered protocol architecture?
- Can you list the programs (or processes) that use TCP? (hint:
sudo netstat -a -n -p -t
) - Can you list the programs (or processes) that use UDP? (hint:
sudo netstat -a -n -p -u
) - Can you run
evince
, a GUI application installed on the Linux virtual machine? (hint: you need to runputty -X brooklyn@IP_ADDRESS
, orssh -Y brooklyn@IP_ADDRESS
, orssh -X brooklyn@IP_ADDRESS
. For PuTTY, you can do it equivalently via its GUI, i.e., on the PuTTY Configuration dialogue window, enter IP address, expand SSH on the left pane, select X11, check Enable X11 forwarding, and then click at Open.)
Experiment 2. Running Python Web Server and Client
In this experiment, we run hello.py
, the simple Web server on a Linux system,
and helloclient.py
, the simple Web client on another Linux system.
At host brooklyn
, run the Web server, and at host midwood
, run the client.
The Web server and the client’s Python source code are as follows. But we
cannot successfully run them on two systems without modifications.
hello.py
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
return "Hello, World!"
helloclient.py
from http import client as httpclient
def main():
conn = httpclient.HTTPConnection('127.0.0.1:5000')
conn.request('GET', '/')
response = conn.getresponse()
print(response.status, response.reason)
received = response.read()
print(received)
conn.close()
if __name__ == "__main__":
# execute only if run as a script
main()
Exercise 2. Exercises and Explorations
Regardless what result you observe, let’s consider the following questions or tasks.
- What is 127.0.0.1? What is 5000? Do you recognize any of these in Experiment 1.
- If you observe an error, what hypotheses can you think of to explain that
the client failed to print
Hello, World!
? (hint: picture the data flow in the diagram of the layered protocol architecture) - What do we have to modify so that they can run successfully? (hint: this is one of the methods to validate or invalidate the hypothesis)
- Finally, you have made it to work. Now how do you figure out in which
process that the server (
hello.py
) in running, and at what end point the client is? (hint: usenetstat
, and you may need to revise thehelloclient.py
program to let the Linux system to give you more time to answer this question, and the end point is something you have observed in Exercise 1).
Experiment 3. Capturing Web Traffic
We want to capture data exchanged between the Web server and the client. To
capture network traffic, we use ScaPy
. There are two ways to use ScaPy
.
- Run the
scapy3
application provided by theScaPy
software. - Write our own Python program with the API provided by
ScaPy
.
We begin with the first approach.
Using the scapy3
Command to Capture Web Traffic
These constitutes basically 4 steps.
- On host
brooklyn, start
scapy3and run
sniffcommand in
scap3`. You need to run this as root, likebrooklyn:~$ xauth list $DISPLAY brooklyn/unix:11 MIT-MAGIC-COOKIE-1 1234560738238201ef brooklyn:~$ sudo -s [sudo] password for brooklyn: # xauth add brooklyn/unix:11 MIT-MAGIC-COOKIE-1 1234560738238201ef # scapy3 >>> packets = sniff(prn=lambda x: x.summary(), filter='tcp port 5000')
- On host
midwood
, startscapy3
and runsniff
command inscap3
. You need to run this as root, likemidwood:~$ xauth list $DISPLAY midwood/unix:11 MIT-MAGIC-COOKIE-1 1234560738238201ef midwood:~$ sudo -s [sudo] password for midwood: # xauth add midwood/unix:11 MIT-MAGIC-COOKIE-1 1234560738238201ef # scapy3 >>> packets = sniff(prn=lambda x: x.summary(), filter='tcp port 5000')
In this step, you may have to specify the network interface. For instance, the interwork interface that is assigned the IP address referenced in the Python code is “enp0s8”, you can run capture packets via:
>>> packets = sniff(prn=lambda x: x.summary(), filter='tcp port 5000', iface='enp0s8')
- Run the Web server (
hello.py
) - Run the Web client (
helloclient.py
)
Exercise 3. Exercises and Explorations
Let’s do the following exercises, and observe the results.
Examining Packet Headers and Layering
Before you proceed, make sure that you connect to the Linux system with X11 Forwarding enabled as you do in Exercise 1.
In scapy3
, we can do something like the following,
CTRL-C
>>> packets[0].pdfdump(layer_shift=1)
>>> hexdump(packets[3])
Running Multiple Web Servers
Modify the Web server and run multiple instances on a single host. Observe packet headers of the captured packets.
Experiment 4. Writing Python Program to Capture Web Traffic
Below is a simplest packet capturing Python program using ScaPy API
webcapture.py
from scapy.sendrecv import sniff
from scapy.utils import wrpcap
def main():
packets = sniff(prn=lambda x: x.summary(), filter="tcp port 80", count=12)
wrpcap('hello.pcap', packets)
if __name__ == "__main__":
main()
However, this program won’t capture anything for our Web server and client.
Exercise 4. Exercises and Explorations
Developing Hypothesis
List hypotheses why the program failed to capture any packets.
Validating Hypothesis
How do you collect evidence to validate or invalidate each hypothesis?
Fixing the Program
How do you fix the program? What were your attempts?