Bulk checking Hreflang Tags for SEO on an eCommerce Store (With Screaming Frog)

Find URLs that should have a hreflang tag – Indexable URLs

Crawl the website with Screaming Frog

Download the “internal” report on first tab

Filter relevant columns to find all URLs that return 200 (Status Code), are indexable (Indexability)

Paste the filtered sheet into a new tab/sheet and delete irrelevant columns so you just have URL and Canonical columns

Use =Exact formula to find all URLs that match the canonical URL

Filter Canonical URL column to “true”

You should be left with all URLs that are indexable – i.e. 200 status codes and URLs that exactly match canonical URLs.

Copy all the URLs that remain

Find Missing Hreflangs – Paste Back into Screaming Frog in List Mode

Use List Mode and Copy the URLs into screaming frog and crawl.

Check the crawl and go to the hreflang tab – order by hreflang “Occurences”

Export hreflang

Check the Screaming Frog Filters

Just discovered we have lots of non-200 hreflang links.

It’s definitely worth checking the filters on the hreflang tab

Official documentation from Screaming Frog here.

Excluding URLs in Screaming Frog Crawl [2023]

To exclude URLs just go to:

Configuration > Exclude (in the very top menu bar)

To exclude URLs within a specific folder, use the following regex:

^https://www.mydomain.com/customer/account/.*
^https://www.mycomain.com/checkout/cart/.*

The above regex, will stop Screaming Frog from Crawling the customer/account folder and the cart folder.

Excluding Images –

Ive just been using the image extensions to block them in the crawl, e.g.

.*jpg

Although you can block them in the Configuration>Spider menu too.

Excluding Parameter URLs

this appears to do the job:

^.*\?.*