Analyzing Web Page Load Times and File Sizes in Python

Photo by Ocean Ng on Unsplash

Analyzing Web Page Load Times and File Sizes in Python

In this article, we'll explore a Python script that analyzes the load times and file sizes of CSS, JavaScript, and image files for a given web page. We'll use the BeautifulSoup library to parse HTML content, the requests library to fetch resources, and the openpyxl library to store the results in an Excel file.

Package Management and Dependencies

We'll use Poetry as a dependency management and packaging tool. To start, install Poetry using the official installation guide: https://python-poetry.org/docs/#installation.

Once you have Poetry installed, create a new project:

poetry new web_load_times
cd web_load_times

Next, add the required dependencies:

poetry add requests beautifulsoup4 openpyxl

Writing the Script

Inside the web_load_times folder, create a new file named load_times.py. This file will contain the main script.

We start by importing the required libraries:

import requests
import time
from bs4 import BeautifulSoup
from openpyxl import Workbook
from urllib.parse import urljoin

The format_size function formats the file size in bytes, kilobytes, or megabytes based on its value:

def format_size(size):
    for unit in ['B', 'KB', 'MB']:
        if size < 1024.0:
            break
        size /= 1024.0
    return f"{size:.2f} {unit}"

The main function performs the following tasks:

  1. Takes the URL input.

  2. Fetches the web page and calculates the complete page load time.

  3. Parses the web page content using BeautifulSoup.

  4. Creates an Excel workbook with appropriate headers.

  5. Iterates through the relevant HTML elements (CSS, JavaScript, and images), calculates their load times, and writes the results to the Excel file.

Here's the main function:

def main():
    # (1) Take URL input
    url = input("Please enter the URL: ")

    # (2) Fetch the web page and calculate the complete page load time
    start_time = time.time()
    response = requests.get(url)
    page_load_time = (time.time() - start_time) * 1000

    # (3) Parse the web page content using BeautifulSoup
    soup = BeautifulSoup(response.content, "html.parser")

    # (4) Create an Excel workbook with appropriate headers
    wb = Workbook()
    ws = wb.active
    ws.title = "Load Times"
    ws["A1"] = "Element"
    ws["B1"] = "URL"
    ws["C1"] = "Load Time (ms)"
    ws["D1"] = "Size"

    # (5) Iterate through the relevant HTML elements and write results to Excel file
    # ...

    # Save the Excel workbook
    wb.save("load_times.xlsx")
    print("Loading times and sizes saved in load_times.xlsx")

The complete load_times.py script can be found here.

Writing Tests

For testing, we'll use Python's built-in unittest library. Inside the tests folder in the web_load_times project, create afile named test_load_times.py. This file will contain the test cases for our script.

First, import the required libraries and the format_size function from the load_times script:

import unittest
from web_load_times.load_times import format_size

Now, create a test case class TestLoadTimes that inherits from unittest.TestCase. We'll write a test for the format_size function to ensure it correctly formats file sizes:

class TestLoadTimes(unittest.TestCase):

    def test_format_size(self):
        self.assertEqual(format_size(123), "123.00 B")
        self.assertEqual(format_size(1234), "1.21 KB")
        self.assertEqual(format_size(1234567), "1.18 MB")

To run the tests, execute the following command in the terminal:

poetry run python -m unittest

This will run the test suite and report the results.

Conclusion

In this article, we've learned how to create a Python script that analyzes web page load times and file sizes for CSS, JavaScript, and image files. We've used Poetry for dependency management and packaging, and written tests using the unittest library. This script can be a helpful tool for optimizing web page performance and understanding which resources have the most impact on load times.

Did you find this article valuable?

Support Theo van der Sluijs by becoming a sponsor. Any amount is appreciated!