Analyzing Web Page Load Times and File Sizes in Python
In this article, we'll explore a Python script that analyzes the load times and file sizes of CSS, JavaScript, and image files for a given web page. We'll use the BeautifulSoup library to parse HTML content, the requests library to fetch resources, and the openpyxl library to store the results in an Excel file.
Package Management and Dependencies
We'll use Poetry as a dependency management and packaging tool. To start, install Poetry using the official installation guide: https://python-poetry.org/docs/#installation.
Once you have Poetry installed, create a new project:
poetry new web_load_times
cd web_load_times
Next, add the required dependencies:
poetry add requests beautifulsoup4 openpyxl
Writing the Script
Inside the web_load_times
folder, create a new file named load_times.py
. This file will contain the main script.
We start by importing the required libraries:
import requests
import time
from bs4 import BeautifulSoup
from openpyxl import Workbook
from urllib.parse import urljoin
The format_size
function formats the file size in bytes, kilobytes, or megabytes based on its value:
def format_size(size):
for unit in ['B', 'KB', 'MB']:
if size < 1024.0:
break
size /= 1024.0
return f"{size:.2f} {unit}"
The main
function performs the following tasks:
Takes the URL input.
Fetches the web page and calculates the complete page load time.
Parses the web page content using BeautifulSoup.
Creates an Excel workbook with appropriate headers.
Iterates through the relevant HTML elements (CSS, JavaScript, and images), calculates their load times, and writes the results to the Excel file.
Here's the main
function:
def main():
# (1) Take URL input
url = input("Please enter the URL: ")
# (2) Fetch the web page and calculate the complete page load time
start_time = time.time()
response = requests.get(url)
page_load_time = (time.time() - start_time) * 1000
# (3) Parse the web page content using BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")
# (4) Create an Excel workbook with appropriate headers
wb = Workbook()
ws = wb.active
ws.title = "Load Times"
ws["A1"] = "Element"
ws["B1"] = "URL"
ws["C1"] = "Load Time (ms)"
ws["D1"] = "Size"
# (5) Iterate through the relevant HTML elements and write results to Excel file
# ...
# Save the Excel workbook
wb.save("load_times.xlsx")
print("Loading times and sizes saved in load_times.xlsx")
The complete load_times.py
script can be found here.
Writing Tests
For testing, we'll use Python's built-in unittest
library. Inside the tests
folder in the web_load_times
project, create afile named test_load_times.py
. This file will contain the test cases for our script.
First, import the required libraries and the format_size
function from the load_times
script:
import unittest
from web_load_times.load_times import format_size
Now, create a test case class TestLoadTimes
that inherits from unittest.TestCase
. We'll write a test for the format_size
function to ensure it correctly formats file sizes:
class TestLoadTimes(unittest.TestCase):
def test_format_size(self):
self.assertEqual(format_size(123), "123.00 B")
self.assertEqual(format_size(1234), "1.21 KB")
self.assertEqual(format_size(1234567), "1.18 MB")
To run the tests, execute the following command in the terminal:
poetry run python -m unittest
This will run the test suite and report the results.
Conclusion
In this article, we've learned how to create a Python script that analyzes web page load times and file sizes for CSS, JavaScript, and image files. We've used Poetry for dependency management and packaging, and written tests using the unittest
library. This script can be a helpful tool for optimizing web page performance and understanding which resources have the most impact on load times.