So the other day I created.... (with a bit of help) a script for me to get all site URLs and put them in a csv file. All urls containing non-ascii chars, underscores or capitals were put in a different file.
So the assignment is:
Now... before I hit enter, it came to me. I have to do some commenting and testing
So the assignment became
And off we go!
Before you proceed, make sure you have installed the required libraries by running:
pip install requests beautifulsoup4
Here's the Python script for your requirement:
Creating tests for this script is a bit challenging since it involves web requests and the content of the web pages may change over time. However, you can create some basic tests for the helper functions
First, split the helper functions
is_skipped_file from the main script into a separate Python file named
script_helpers.py. Then, import these functions in the main script using:
from script_helpers import is_wrong_url, is_skipped_file
Next, create a new Python file named
test_script.py and include the following test script using the
import unittest from script_helpers import is_wrong_url, is_skipped_file class TestScript(unittest.TestCase): def test_is_wrong_url(self): self.assertTrue(is_wrong_url("https://example.com/Test_Url")) self.assertTrue(is_wrong_url("https://example.com/テスト")) self.assertFalse(is_wrong_url("https://example.com/test-url")) def test_is_skipped_file(self): self.assertTrue(is_skipped_file("https://example.com/image.jpg")) self.assertTrue(is_skipped_file("https://example.com/script.js")) self.assertTrue(is_skipped_file("https://example.com/styles.css")) self.assertFalse(is_skipped_file("https://example.com/test-url")) if __name__ == "__main__": unittest.main()
To run the tests, execute the
test_script.py file in the command line:
These tests will verify the functionality of the
is_skipped_file functions by checking if they correctly identify wrong URLs and skipped files, respectively.
Please note that these tests only cover a small part of the script's functionality. Testing the
get_all_links_recursive function is more complex due to its dependency on web requests and the dynamic nature of web page content. In practice, you might use a library like
responses to mock web requests and create tests for this function.