Find Broken Links in a Web Page

To find all the broken links in a web page using Selenium in Java, find the web elements with the tag name "a" using driver.findElements(By.tagName("a")). For each link element, send HTTP request. If the link is broken, then the HTTP response code would be one of the following.

Response CodeDescription
400Bad Request (Bad Host / Bad URL / Empty / Timeout / Reset)
404Page Not Found
403Forbidden
410Gone
408Request Time Out
503Service Unavailable

Please note that in this tutorial, we define that a link is broken if the request for the link responds with any of the above codes. Your definition of a broken link may change based on your application requirement. Do make necessary changes based on that.

Example

ADVERTISEMENT

In the following program, we write Selenium Java script to visit google.com, extract all the links in this web page, and iterate over each of the link if the link is broken or not.

Java Program

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class MyAppTest {
	public static void main(String[] args) {
		System.setProperty("webdriver.chrome.driver", "/usr/local/bin/chromedriver");  
		WebDriver driver = new ChromeDriver();
		driver.get("https://google.com/ncr");

		List<WebElement> links = driver.findElements(By.tagName("a"));

		String url = "";
		HttpURLConnection connection = null;
		int respCode = 0;
		for(WebElement link: links) {
			try {
				url = link.getAttribute("href");
				connection = (HttpURLConnection)(new URL(url).openConnection());
				connection.setRequestMethod("HEAD");
				connection.connect();

				respCode = connection.getResponseCode();
				if(respCode == 400 ||
						respCode == 403 ||
						respCode == 404 ||
						respCode == 408 ||
						respCode == 410 ||
						respCode == 503){
					System.out.println("[Broken]     - " + url);
				}
				else{
					System.out.println("[Not Broken] - " + url);
				}
			} catch (MalformedURLException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		connection.disconnect();

		driver.quit();
	}
}

Screenshots

1. Initialize web driver and visit google.com.

WebDriver driver = new ChromeDriver();
driver.get("https://google.com/ncr");
Find Broken Links in Web Page - Selenium

2. Verify if each link is broken or not.

Console Output.

Find Broken Links in Web Page - Selenium