So, I have been letting ChatGPT help me for over 3 days now.
Do not get me wrong! I like writing code, but some code, small pieces of code, ChatGPT comes in really handy.
I used to use ScreamingFrog for checking the SEO for all kinds of sites. But my license expired, so I was going to renew it.
And then….. the new price! £199 !!! 199!!!!! That is 59 more than the last time. A year ago it was £140
£59 increase!! WTF!!! 40%, and yes they did not increase it for 8 years, but they could have increased it every year a bit right?
I've talked to them but they did not want to help out. So I started to write code to SEO scrape my sites with python. Well... actually ChatGPT is writing the code for me.
In my last two articles you could read about getting 404 pages and now I want to get images without a ALT tag.
Let's go! The assignment is:
Write a python script that searches for all images on a domain without a filled alt tag, the output should be punt in the csv file named "images_without_alt_tags.csv" with in the first column the image url and the second column should contain the url where the image is found. Ask me for the domain by using input and do not forget that this script has to be recursive.
And off we go!
To achieve this, you can use the 'requests' and 'BeautifulSoup' libraries to fetch and parse the HTML content of the domain. You can then search for images with empty or missing 'alt' attributes and write the results to a CSV file.
Before you proceed, make sure you have installed the required libraries by running:
pip install requests beautifulsoup4
Here's the Python script for your requirement:
import requests from bs4 import BeautifulSoup import csv from urllib.parse import urljoin def get_all_links_recursive(url, domain, visited=None): if visited is None: visited = set() if url in visited: return visited visited.add(url) page = requests.get(url) soup = BeautifulSoup(page.content, "html.parser") for link in soup.find_all("a"): href = link.get("href") if href: abs_url = urljoin(domain, href) if abs_url.startswith(domain) and abs_url not in visited: visited = get_all_links_recursive(abs_url, domain, visited) return visited def find_images_without_alt_tags(url): page = requests.get(url) soup = BeautifulSoup(page.content, "html.parser") images =  for img in soup.find_all("img"): alt = img.get("alt") src = img.get("src") if src and (alt is None or alt.strip() == ""): images.append((src, url)) return images def main(): domain = input("Enter the domain URL: ") all_links = get_all_links_recursive(domain, domain) images_without_alt_tags =  for link in all_links: images_without_alt_tags.extend(find_images_without_alt_tags(link)) with open("images_without_alt_tags.csv", "w", newline="") as csvfile: writer = csv.writer(csvfile) writer.writerow(["Image URL", "Page URL"]) for img in images_without_alt_tags: writer.writerow(img) print("CSV file created: images_without_alt_tags.csv") if __name__ == "__main__": main()
This updated script will:
Prompt the user for the domain URL.
Recursively fetch all the links within the domain using the
Iterate through all the links and find images with empty or missing 'alt' attributes.
Write the image URLs and the corresponding page URLs to a CSV file named 'images_without_alt_tags.csv'.