The best report in Screaming Frog to see the source and destination of all 404s – is to go to Bulk Export at the top menu:

And then Response Codes – Client Error Inlinks

The best report in Screaming Frog to see the source and destination of all 404s – is to go to Bulk Export at the top menu:
And then Response Codes – Client Error Inlinks
Crawl the website with Screaming Frog
Download the “internal” report on first tab
Filter relevant columns to find all URLs that return 200 (Status Code), are indexable (Indexability)
Paste the filtered sheet into a new tab/sheet and delete irrelevant columns so you just have URL and Canonical columns
Use =Exact formula to find all URLs that match the canonical URL
Filter Canonical URL column to “true”
You should be left with all URLs that are indexable – i.e. 200 status codes and URLs that exactly match canonical URLs.
Copy all the URLs that remain
Use List Mode and Copy the URLs into screaming frog and crawl.
Check the crawl and go to the hreflang tab – order by hreflang “Occurences”
Export hreflang
Just discovered we have lots of non-200 hreflang links.
It’s definitely worth checking the filters on the hreflang tab
Official documentation from Screaming Frog here.
To exclude URLs just go to:
Configuration > Exclude (in the very top menu bar)
To exclude URLs within a specific folder, use the following regex:
^https://www.mydomain.com/customer/account/.*
^https://www.mycomain.com/checkout/cart/.*
The above regex, will stop Screaming Frog from Crawling the customer/account folder and the cart folder.
Ive just been using the image extensions to block them in the crawl, e.g.
.*jpg
Although you can block them in the Configuration>Spider menu too.
this appears to do the job:
^.*\?.*