Find Broken Links in a Web Page

To find all the broken links in a web page using Selenium in Java, find the web elements with the tag name "a" using driver.findElements(By.tagName("a")). For each link element, send HTTP request. If the link is broken, then the HTTP response code would be one of the following.

Response Code Description
400 Bad Request (Bad Host / Bad URL / Empty / Timeout / Reset)
404 Page Not Found
403 Forbidden
410 Gone
408 Request Time Out
503 Service Unavailable

Please note that in this tutorial, we define that a link is broken if the request for the link responds with any of the above codes. Your definition of a broken link may change based on your application requirement. Do make necessary changes based on that.

Example

In the following program, we write Selenium Java script to visit google.com, extract all the links in this web page, and iterate over each of the link if the link is broken or not.

Java Program

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class MyAppTest {
	public static void main(String[] args) {
		System.setProperty("webdriver.chrome.driver", "/usr/local/bin/chromedriver");  
		WebDriver driver = new ChromeDriver();
		driver.get("https://google.com/ncr");

		List<WebElement> links = driver.findElements(By.tagName("a"));

		String url = "";
		HttpURLConnection connection = null;
		int respCode = 0;
		for(WebElement link: links) {
			try {
				url = link.getAttribute("href");
				connection = (HttpURLConnection)(new URL(url).openConnection());
				connection.setRequestMethod("HEAD");
				connection.connect();

				respCode = connection.getResponseCode();
				if(respCode == 400 ||
						respCode == 403 ||
						respCode == 404 ||
						respCode == 408 ||
						respCode == 410 ||
						respCode == 503){
					System.out.println("[Broken]     - " + url);
				}
				else{
					System.out.println("[Not Broken] - " + url);
				}
			} catch (MalformedURLException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		connection.disconnect();

		driver.quit();
	}
}

Screenshots

1. Initialize web driver and visit google.com.

WebDriver driver = new ChromeDriver();
driver.get("https://google.com/ncr");

2. Verify if each link is broken or not.

Console Output.