Capturing Web Traffic

Table of Content

Introduction
Experiment Environment
Experiment 1. Creating Two Linux Systems on Virtual Machines
- Exercise 1. Exercises and Explorations
Experiment 2. Running Python Web Server and Client
Experiment 3. Capturing Web Traffic
- Using the scapy3 Command to Capture Web Traffic
- Exercise 3. Exercises and Explorations
  - Examining Packet Headers and Layering
  - Running Multiple Web Servers
Experiment 4. Writing Python Program to Capture Web Traffic
- webcapture.py
- Exercise 4. Exercises and Explorations

Introduction

This experiment is via hands-on experience to help achieve this module’s learning objectives. Upon completing the experiment and a few related in future lessons, we should be able to

to capture and analyze TCP/IP network packets,
to understand the overall architecture of the Internet,
to have an overview of the operations of TCP/IP networks,
to describe the TCP/IP architecture and explain the functioning of each layer, and
to understand the operations of the protocols of the Internet

Experiment Environment

The experiment environment consists of a host system, Linux systems on virtual machines, X server on the host, and a secure shell client on the host. We shall introduce these in the following.

Oracle VM VirtualBox

Virtualisation software allows us to run multiple virtual machines on a host computer, and with these virtual machines we can build and experiment computer networks on the host system. You will download two pieces of software from Oracle VM VirtualBox.

Download a VirtualBox platform package that matches your operating system from https://www.virtualbox.org/wiki/Downloads and install it.
Download the VirtualBox Extension Pack from the same Web page and install it.

Linux System

The instructor prepared a prebuilt Oracle VM VirtualBox Linux system image with necessary packages installed. This prbuilt Linux system image does not have any GUI desktop software (such as, Gnome or KDE) installed, as such, requires barely minimum memory (~190 MB) to run, is thus suitable to run multiple instances of the system on virtual machines on an ordinary desktop or laptop computer that supports virtualisation. Download the image from

Oracle VM VirtualBox Linux System Image via CUNY Blackboard

In the later Section of Creating Two Linux Systems, we will use this image to create two or more Linux systems on virtual machines.

X Server

The prebuilt Linux system does not have any GUI desktop software installed. To display graphics, we need to install an X server on the host system.

Windows

There are a few free X server software for Windows available. The instructor recommends vcXsrv, a X server based on the X.org Foundation’s source code and built using Microsoft’s Visual Studio (Visual C++). Download it from

ArcticaProject’s vcXsrv release on Github

To launch the X server, use the XLaunch shortcut created during the installation, and the default settings are appropriate for us.

Mac OS X

Apple created the XQuartz project and relies on a community effort to further develop and support X11 on Mac. If you are using a Mac OS X system as the host computer, download and install

XQuartz

Linux

Linux comes with X server if you have GUI desktop software like Gnome or KDE installed. If you have already had one of these running, you don’t need to install additional X server software for this exercise.

Secure Shell Client

We use a Secure Shell client to access the Linux systems on the virtual machines. We have at least three advantages to do so.

We can easily have multiple terminals to a Linux system.
We can conveniently copy and paste texts among the Linux terminals and the host computer
Last, but not the least, we can use the Secure Shell client’s X11 Forwarding feature to display the Linux systems’ graphics on the display availed by the X server on the host computer.

On Windows, many people choose PuTTY, a small and elegant secure shell client software that has continuously gotten updates over the years. If you don’t have an up-to-date secure shell client on your Windows system, you should download and install it.

Microsoft has begun to ship the OpenSSH client as an optional feature on its Windows 10 operating systems since December 2017. If you are using a latest release of a Windows 10 system, to verify whether you have already had it installed, or to determine if it is available in the particular release of your Windows system, or to choose to install it, follow the navigation path on the user interface, Settings, Apps, and Optional Features. If the user interface does not list it, select Add a feature.

On Unix systems like OS X, Linux, and Solaris, Secure Shell clients are part of the system. If you are using one of those for this class, you have already gotten it.

Experiment 1. Creating Two Linux Systems on Virtual Machines

We now create two Linux systems.

Extract the downloaded Linux system virtual machine image.
Open VirtualBox, and via menu Machine | Add to create and add the first Linux system
Use VirtualBox to create a linked clone of the virtual machine of the first Linux system. When cloning the virtual machine, choose Machine | Clone from the menu, select to generate new MAC addresses for network adapters and the linked clone type.
To differentiate easily these Linux systems, we should change the host name of the newly created system to something different. To do this, we edit two files on the newly created Linux system (the linked clone), /etc/hosts and /etc/hostname. For instance, we replace brooklyn in /etc/hosts by midwood, and replace brooklyn in /etc/hostname by midwood, reboot the system, and you should observe that the hostname displayed on the command prompt changes to midwood.

For convenience, we call this two Linux systems running on virtual machines brooklyn and midwood.

Exercise 1. Exercises and Explorations

Let’s try the following,

Can you show the network interface cards (NICs) and their configurations on the Linux systems? (hint: using ip address show)
How many NICs are there? What are their link addresses, and what are their IP addresses?
How do those fit in the diagram of the layered protocol architecture that we discussed in class?
Can you open multiple terminal to a single Linux system using a Secure Shell (SSH) client? How does the SSH client fit in the diagram of the layered protocol architecture? (hint: we shouldn’t confuse the SSH client with the SSH protocol, an application layer protocol)?
Upon successfully opening a terminal via the SSH client to the Linux system, can you picture how the data (like your key strokes, the output of a command you run) would flow in the diagram of the layered protocol architecture?
Can you list the programs (or processes) that use TCP? (hint: sudo netstat -a -n -p -t)
Can you list the programs (or processes) that use UDP? (hint: sudo netstat -a -n -p -u)
Can you run evince, a GUI application installed on the Linux virtual machine? (hint: you need to run putty -X brooklyn@IP_ADDRESS, or ssh -Y brooklyn@IP_ADDRESS, or ssh -X brooklyn@IP_ADDRESS. For PuTTY, you can do it equivalently via its GUI, i.e., on the PuTTY Configuration dialogue window, enter IP address, expand SSH on the left pane, select X11, check Enable X11 forwarding, and then click at Open.)

Experiment 2. Running Python Web Server and Client

In this experiment, we run hello.py, the simple Web server on a Linux system, and helloclient.py, the simple Web client on another Linux system.

At host brooklyn, run the Web server, and at host midwood, run the client. The Web server and the client’s Python source code are as follows. But we cannot successfully run them on two systems without modifications.

`hello.py`

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello, World!"

`helloclient.py`

from http import client as httpclient

def main():  
	conn = httpclient.HTTPConnection('127.0.0.1:5000')
	
	conn.request('GET', '/')  
	response = conn.getresponse()  
	print(response.status, response.reason)  
	received = response.read()  
	print(received)

	conn.close()  

if __name__ == "__main__":	
	# execute only if run as a script
	main()

Exercise 2. Exercises and Explorations

Regardless what result you observe, let’s consider the following questions or tasks.

What is 127.0.0.1? What is 5000? Do you recognize any of these in Experiment 1.
If you observe an error, what hypotheses can you think of to explain that the client failed to print Hello, World!? (hint: picture the data flow in the diagram of the layered protocol architecture)
What do we have to modify so that they can run successfully? (hint: this is one of the methods to validate or invalidate the hypothesis)
Finally, you have made it to work. Now how do you figure out in which process that the server (hello.py) in running, and at what end point the client is? (hint: use netstat, and you may need to revise the helloclient.py program to let the Linux system to give you more time to answer this question, and the end point is something you have observed in Exercise 1).

Experiment 3. Capturing Web Traffic

We want to capture data exchanged between the Web server and the client. To capture network traffic, we use ScaPy. There are two ways to use ScaPy.

Run the scapy3 application provided by the ScaPy software.
Write our own Python program with the API provided by ScaPy.

We begin with the first approach.

Using the `scapy3` Command to Capture Web Traffic

These constitutes basically 4 steps.

On host brooklyn, start scapy3 and run sniff command in scap3`. You need to run this as root, like

brooklyn:~$  xauth list $DISPLAY
brooklyn/unix:11  MIT-MAGIC-COOKIE-1  1234560738238201ef
brooklyn:~$ sudo -s
[sudo] password for brooklyn:
# xauth add brooklyn/unix:11  MIT-MAGIC-COOKIE-1 1234560738238201ef  
# scapy3
>>> packets = sniff(prn=lambda x: x.summary(), filter='tcp port 5000')

On host midwood, start scapy3 and run sniff command in scap3. You need to run this as root, like

midwood:~$  xauth list $DISPLAY
midwood/unix:11  MIT-MAGIC-COOKIE-1  1234560738238201ef
midwood:~$ sudo -s
[sudo] password for midwood:
# xauth add midwood/unix:11  MIT-MAGIC-COOKIE-1 1234560738238201ef  
# scapy3
>>> packets = sniff(prn=lambda x: x.summary(), filter='tcp port 5000')

Run the Web server (hello.py)
Run the Web client (helloclient.py)

Exercise 3. Exercises and Explorations

Let’s do the following exercises, and observe the results.

Examining Packet Headers and Layering

Before you proceed, make sure that you connect to the Linux system with X11 Forwarding enabled as you do in Exercise 1.

In scapy3, we can do something like the following,

CTRL-C
>>> packets[0].pdfdump(layer_shift=1)
>>> hexdump(packets[3])

Running Multiple Web Servers

Modify the Web server and run multiple instances on a single host. Observe packet headers of the captured packets.

Experiment 4. Writing Python Program to Capture Web Traffic

Below is a simplest packet capturing Python program using ScaPy API

`webcapture.py`

from scapy.sendrecv import sniff
from scapy.utils import wrpcap

def main():
	packets = sniff(prn=lambda x: x.summary(), filter="tcp port 80", count=12)
	wrpcap('hello.pcap', packets)
	
if __name__ == "__main__":
	main()

However, this program won’t capture anything for our Web server and client.

Exercise 4. Exercises and Explorations

Developing Hypothesis

List hypotheses why the program failed to capture any packets.

Validating Hypothesis

How do you collect evidence to validate or invalidate each hypothesis?

Fixing the Program

How do you fix the program? What were your attempts?