How To Find Broken Images Using Selenium WebDriver?

Posted by Himanshu Sheth | February 17, 2021
Automation • Selenium Webdriver •

103341 Views | 20 Min Read

Find Broken Images Using Selenium WebDriver

A web product’s user experience is one of the key elements that help in user acquisition and user retention. Though immense focus should be given to the design & development of new product features, a continuous watch should be kept on the overall user experience. Like 404 pages (or dead links), broken images on a website (or web app) could also irk the end-users. Manual inspection and removal of broken images is not a feasible and scalable approach. Instead of using third-party tools to inspect broken images, you should leverage Selenium automation testing and see how to find broken images using Selenium WebDriver on your website.

Not Found

Source

In this part of the Selenium Tutorial, we look at how to find broken images on websites using Selenium WebDriver. From an end-user’s perspective, even a single broken image on a page could be an experience dampener – a prime reason to find broken images on websites.

By the end of this blog, you would be able to find broken images using Selenium WebDriver with Python, Java, C#, and PHP.

What are Broken Images in Web Testing?

Broken image is a link/image that does not show up as a picture, clicking upon which takes the end-user to a defunct picture. The user encounters a 404 Error when clicked on the broken image. This error means that there is an issue with the image URL, and the image is not loaded properly (due to various reasons).

Shown below is an example of broken images on a website:

Broken Images in Web Testing

From an end-user experience and retention point of view, fixing broken images should be considered equally important as fixing broken links on websites. Selenium WebDriver can be used to find broken images on websites. The internal logic for locating broken images might vary based on how the images are fetched from the server.

Here are two ways in which images are read from the server:

  • Absolute Path – As the name indicates, the website uses the absolute path (or complete path) in the ‘src’ attribute that specifies the path to the intended image. The <img> tag in HTML creates a holding space for the referenced image.

    Shown below is an example of usage of an absolute path in the ‘src’ attribute of the <img> tag:

    Absolute Path

    The image shown above is fetched from an absolute location (i.e., the HostName is used in the <src> attribute):

    Broken Image Test

  • Relative Path – On many websites, a relative image on a path is placed in the attribute. The relative path is always relative to the root of the document (i.e., website/web app).

    For example, in <img src=”assets/img/image.jpg” alt=”some text”>; the path of image.jpg is relative to the root. If the website URL is https://www.someexample.com, the relative path of the image (image.jpg) will equate to https://www.someexample.com/assets/img/image.jpg

    Here is a sample usage of relative path in the <src> attribute of the <img> tag:

    usage of relative path

    You would be curious to know what leads to broken images on a website. Let’s look at the ‘why part’ of broken images?

Major reasons for Broken images on a webpage

Here are some of the prominent reasons that lead to broken images (i.e., file not found or 404 error for images) on a website (or web apps):

  • Incorrect Image Format – In case you have uploaded the image in a .jpg format, but the image is identified as a .png format in the code, it leads to an error when the image is getting displayed. It is necessary to ensure that the image formats coincide when uploading to the server and referring to the same in the code.
  • Incorrect Image URL – When rendering the specified image, the browser reads the image location from the <src> attribute in the <img> tag. If a wrong image path or incorrect filename is mentioned in the <src> attribute, it leads to issues in displaying the image (and 404 errors).
  • Deleted Image file – The HTML code link could be referring to a file that is either wrongly spelled in the code or no longer exist on the server.
  • Site relocation – After relocating the site from one provider to another, a thorough check should be conducted to verify whether all the site assets are available and accessible on the new server.
  • 301 redirection – During a website redesign activity, 301 redirects should be done for the site content and the images used on the site. Along with the redirection of URLs, utmost attention should be given to the redirection of the images that reside in those URLs.
  • Unavailability of the server – In scenarios where the server does not give a response within a certain time-frame, images would fail to appear on the site.

Like broken links, attention should be given to ensure that your web product is free from broken images.

Why should you check for broken images?

Here are the two major reasons for checking for broken images on websites:

  • Broken images on a website hamper the end-user experience, which could negatively impact the growth of the product.
  • Images are an essential part of the content marketing strategy. However, broken images could create SEO issues. Images with missing ALT tags and broken internal images are problematic from an SEO point of view and should be looked into on a high priority.

How to Find Broken Images Using Selenium WebDriver?

When a user visits a website, the user request is sent to the website’s server, which processes the request. In response to the browser’s request, the server sends a three-digit code referred to as the HTTP Status Code to the browser.

Some of the commonly used classes of HTTP Status Codes are 1xx, 2xx, 3xx, 4xx, and 5xx.
To find broken images using the Selenium WebDriver, we would be using the 4xx class of status code, indicating that the particular page or the complete website is not reachable. The status code of class 2xx (particularly 200) suggests that the request sent by the web browser was successful, and the appropriate response was sent to the browser.

When an image is not available on the server, a response code 404 (Page Not Found) is sent to the web browser. You can refer to our earlier blog for detailed information on HTTP Status Codes and Status Codes presented on the detection of broken links/images.

Irrespective of the programming language being used to detect broken images, the basic principles remain the same. Here are some of the steps that can be followed to find broken images on websites:

  1. Use the <img> tag to collect details of the images present on the page.
  2. For each <img> tag, get the attribute <src> from the tag.
  3. Convert the path obtained from the <src> attribute to an ‘Absolute Path.’ Conversion to absolute path might not be required for Selenium Java, Selenium C#, and Selenium Python. Converting the ‘relative path’ of the image(s) to ‘absolute path’ is a must when Selenium PHP is used.
  4. Send an HTTP request to the image link obtained from step(3) and capture the response code received in response to the request.
  5. Based on the response code sent by the server, you should validate whether the image is broken. Response code 200 (i.e., HttpStatusCode.OK) means that the image is available on the server.
  6. Validate whether the link is broken or not based on the response code sent by the server.
  7. Repeat steps (2-6) for every image present on the page.

The naturalWidth attribute returns the original width of the image, and it is zero for a broken image. For Selenium with Java, you could also check if the naturalWidth attribute of the image is zero or not.

In this Selenium Tutorial, we demonstrate how to find broken images using Selenium WebDriver in Java, Python, C#, and PHP. The tests are run on the latest version of the Chrome Browser on the Windows 10 platform. The execution is carried out on the cloud-based Selenium Grid provided by LambdaTest.

To get started with LambdaTest, you should create an account on the website and note the user-name & access-key from the profile section on LambdaTest. The browser capabilities are generated using LambdaTest Capabilities Generator.

Here is the test scenario to find broken images on the website:

Test Scenario

  1. Go to https://the-internet.herokuapp.com/broken_images on Chrome (latest)
  2. Read the details about the images present on the page
  3. Send HTTP request for each image
  4. Check the response code of the HTTP request. If the response code is 200, the image is not broken; else, the image is broken.
  5. Print whether the image is broken or not on the terminal

The URL under test https://the-internet.herokuapp.com/broken_images has two broken images and two proper images.

Shown below are the two broken images on the website:

broken images

Here are the two proper (or not broken) images on the website:

broken image check

How to find Broken Images using Selenium Java?

FileName – pom.xml

Implementation

FileName – testng.xml

FileName – test_brokenimages.java

Code Walkthrough [Approach – 1]

1. Import the required packages

The Apache HttpClient library is used for handling the HTTP requests. To use the latest version of HttpClient library, the dependency for the library is added to the Maven Build file (pom.xml).

To find the broken images on the page under test, the HttpClient library is used for checking the status codes of the images present on the page. The necessary packages are imported so that its methods can be used in the implementation.

2. Find all the images on the page

The findElements method in Selenium is used for fetching the details of all the images present on the page. The images are located using the tagName ‘ img.’

The images are placed in a list, which will be further iterated to find broken images on the page.

3. Create a new instance of HttpClient

The HttpClient class offers an API that primarily consists of three core classes – HttpClient, HttpRequest, and HttpResponse. HttpResponse describes the result of an HttpRequest call. For reading the response body, we create a new instance of HttpClient and request the objects. The new instance of the class is created with the build() method of HttpClientBuilder class.

4. Create a new instance of HttpGet

CloseableHttpClient provides the execute method for sending and receiving the data. The execute method uses the parameter of type HttpUriRequest, which has many sub-classes, including HttpGet and HttpPost.

We first create a new HttpGet object with the HttpUriRequest set to path retrieved by reading the src attribute in the WebElement img.

For example – getAttribute(“src”) for the image “Fork me on GitHub” will return /img/forkme_right_green_007200.png.

Broken Image Test Using Selenium

5. Retrieve the response object

The execute method executes the HTTP request using the default context. It returns the response body (i.e. HttpResponse).

6. Read the Status Code

The getStatusLine method of the HttpResponse class obtains the status line of the response [obtained from step(5)]. The getStatusCode method returns the HttpStatus in an integer format. Response Code 200 (SC_OK) means that the HTTP request was executed successfully.

If HttpStatus is 200, the concerned image is not broken, whereas HttpStatus for a broken image is 404. Steps (3) thru’ (6) are repeated for all the WebElement entries in the image list. The outerHTML attribute for the broken images is printed for reference on the terminal.

Code Walkthrough [Approach – 2]

1. Find all the images on the page

Similar to Step(2) in Approach – 1, the findElements method in Selenium is used to fetch the details of images present on the image. The tagName img is used with the findElements method to achieve the same.

2. Read the naturalWidth attribute

The naturalWidth attribute of the WebElements identified in Step(1) is read. For broken images, naturalWidth will be zero whereas it is non-zero for normal images.

Step (2) is repeated for all the WebElements in the list image_list, which was obtained in Step (1). The variable iBrokenImageCount indicates the number of broken images on the page.

Execution

Shown below are the execution snapshots of Approach – 1 and Approach – 2. As expected, we see that there are two broken images on the webpage under test.

Execution

test execution snapshot

How to find Broken Images using Selenium Python?

Implementation

FileName – test_brokenimages.py

Code Walkthrough

1. Import Modules

The requests module is imported so that we can send HTTP requests to the target URL. In case the requests module is not installed on the dev machine, run the command pip install requests to install the same.

Import Modules

2. Fetch details about the images present on the page

WebElements with the ‘img’ tag are read using the find_elements method in Selenium.

3. Send an HTTP request

The get() method in the requests module sends a GET request to the URL passed to it. The src attribute in the img tag contains the location of the image on the server. It is passed to the requests.get() method. Stream in the get() method is set to true, so the response to the HTTP request is immediately downloaded.

In return, we get requests.Response() object that contains the server’s response to the HTTP request.

4. Read the Status Code from the Response object

The status_code property in requests.Response() object indicates the status of the HTTP request. HTTP Status Code of 200 means that the image is not broken whereas the image is broken if the Status Code is 404.

Repeat steps (3) through (4) for all the WebElement entries in the list (i.e., image_list).

Execution

We run the file by triggering the command python <file_name.py> on the terminal. As shown below, two broken images were found on the page under test.

Test Execution

How to find Broken Images using Selenium C#?

Implementation

FileName – BrokenImageTest.cs

Code Walkthrough

We have used the NUnit framework for demonstration. You can check out our earlier blog on NUnit Test automation with Selenium C# to get started with the NUnit framework.

1. Include HttpClient namespace

The HttpClient class provides the base class used for sending HTTP requests and receiving the corresponding response from the resource identified by URI.

It is recommended to use HttpClient instead of HttpWebRequest (of the System.Net.HttpWebRequest namespace) for detecting broken images using Selenium WebDriver.

2. Create a method that returns an async task

The GetAsync method is used for sending a GET request to the specified URI as an asynchronous operation.

3. Create an instance of HttpClient

An instance of the HttpClient is created. The methods offered by HttpClient class will be further used for fetching the details of images present on the page under test.

4. Read the images present on the page

The details of the images present on the page are fetched by locating the WebElements with TagName ‘img’ property.

The findElements method returns a list that is iterated to check the broken images on the page.

5. Iterate through the image list to check for broken images

The GetAsync method in HttpClient class sends an async GETrequest to the corresponding URI. The value of the anchor’s ‘src’ attribute collected using the GetAttribute method is passed in the GetAsync method.

6. Read the HttpStatus Code

On completion of the Async operation in Step(5), HttpResponseMessage is returned. The response includes the data and status code. Response code HttpStatusCode.OK (i.e., 200) indicates that the image was located on the server, and it was read successfully. We keep a counter of the number of broken images on the page.

The exceptions NotSupportedException and ArgumentNullException are handled as part of exception handling.

Execution

Here is the execution snapshot, which indicates that two broken images were present on the page under test.

Execution

two broken images

How to find Broken Images using Selenium PHP?

Implementation

FileName – composer.json

FileName – tests\BrokenImageTest.php

Code Walkthrough

To find broken images with Selenium PHP, we would be using the PHPUnit framework with Selenium. Refer to our detailed Selenium PHP tutorial for a quick recap on Selenium with PHPUnit.

Run the command composer require on the terminal for installing the packages mentioned in composer.json.

command composer

Here is the overall walkthrough of the source code:

1. Read the page source

HTML source of the page under test (i.e. https://the-internet.herokuapp.com/) is read using the file_get_contents function in PHP. HTML source is read in a local String variable $html.

2. Instantiate the DOMDocument class

The entire HTML document is represented in the DOMDocument class. It also serves as the root of the source tree.

3. Parse HTML source of the page

DOMDocument::loadHTML() function in PHP parses the HTML source available in the String variable $html. The function returns a DOMDocument object when executed successfully.

4. Extract the Images using ‘img’ tag

Entries in the <img> HTML tag are read using the getElementsByTagName method of the DOMDocument class. As we are looking for broken images, search is based on the <img> tag from the parsed HTML source.

5. Read the entries enclosed in ‘src’ attribute

The values of the ‘src’ attribute are read from the <img> entries extracted in Step(4).

For example – The ‘src’ attribute in <img src=”img/avatar-blank.jpg”> is “img/avatar-blank.jpg”.

6. Convert the relative path to absolute path

This step is only applicable if the ‘src’ attribute in the <img> tag returns a relative path from the root of the document.

In the case of http://the-internet.herokuapp.com/broken_images, the images are read using the relative path.

Convert the relative path

Take the LambdaTest blog case, the images in the blogs are read using the absolute path of the images on the server. Shown below is an example of how the absolute path of the image is used in the ‘src’ attribute of the <img> tag:

LambdaTest blog

We created a new function relative2absolute() that takes the following arguments – relative path obtained from the <src> attribute and root document of the URL under test.

Relative Path (Sample)

For http://the-internet.herokuapp.com/broken_images, the relative path would equate to /$img_path. If the $img_path is img/avatar-blank.jpg; the final relative path used by the relative2absolute function would be /img/avatar-blank.jpg The base URL is set to https://the-internet.herokuapp.com/

Absolute path (Sample)

If an absolute path is used in the <src> attribute, the absolute path and the relative path would be the same. In such a scenario, Step (6) becomes optional.

We came up with the relative2absolute function with support from the StackOverflow Community ☺.

7. Convert the relative path to absolute path

The get_headers() function is used to fetch all the headers sent by the server in response to the HTTP request. For a broken image, the HTTP status code is 404, whereas the status code is 200 if the image is present on the server.

The preg_match() function in PHP does a case-insensitive search for “200” (HTTP Status Code if the request is completed successfully) in the response code. The local variable iBrokenImageCount is incremented when a broken image is present on the page.

Execution

To run the test that is using the PHPUnit framework, run the following command from the root folder:

When the test is run against https://the-internet.herokuapp.com/broken_images, it shows that the page has two broken images.

test broken image using selenium

We executed the same test against the LambdaTest blog after doing the minimal changes in the code under the ‘For a site with absolute path’ comment.

LambdaTest blog

LambdaTest blog

The site uses the absolute path in <src> attribute of img tag. As seen below, there are zero broken images on the LambdaTest blog.

absolute path

That’s all folks

Fixed

Source

Like broken links on web pages, broken images could also hinder the overall user experience. It also creates a negative impact on the search rankings, thereby hampering your SEO efforts. Instead of relying on third-party tools where you are putting the privacy & data at stake, you should find broken images using Selenium WebDriver. In this Selenium tutorial, we had a look at how to find broken images using Selenium WebDriver with Java, Python, C#, and PHP languages.

What strategy do you follow for finding broken images on webpage(s)? Do leave your thoughts in the comments section…

Happy Testing ☺

Frequently Asked Questions

How do I find an image in selenium?

To get the source of an image using Selenium WebDriver, we can take the help of XPath. If the <img> tag is in the webpage and has a source attribute, then we can use the getAttribute() method of that HTML element to get the source of the image.

How do you find hidden elements in selenium?

If a form element has an ID that is the same as another form element, you can use XPath to find the specific hidden element. For example, if you are using Chrome developer tools, you may notice that some hidden object elements have a type=”hidden” attribute. This means that there is likely only one instance of this particular hidden object on the page, making it simpler to target and grab with an XPath expression.

How do I get all the links in selenium?

  1. Navigate to the web page from which you want to get the link.
  2. Get list of WebElements with tagname ‘a’ using driver.findElements()-
    List <WebElement> allLinks = driver.findElements(By.tagName(“a”)).
  3. Use for-each loop to traverse through the list.
  4. Print the link text using getText() along with its address using getAttribute(“href”)
    System.out.println(link.getText() + ” – ” + link.getAttribute(“href”));

Written by

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *