Disallow via robots meta tag or X-robots-tag?

rohit-lotlikar-b6b3af34 · 18 November 2021 10:10

If faceted search creates certain URL parameters that you don’t want search engines or web crawlers to index, you can block them with a robots meta tag. Just add the following noindex tag to the section of your page:

This will prevent search engines from indexing those pages. You can even customize the tag to only allow certain crawlers, like Googlebot. However, it won’t do anything to free up crawl budget or preserve link equity. If you want to do that, you’ll need to also add a nofollow, like this:

While that’s a great solution for single URLs, it’s not very scalable. If you have hundreds (or thousands) of [ecommerce product pages], you’ll need to use an X-robots-tag.

Let’s say, for example, your faceted search results always appear after the directive /filter/ or /sort/ in your URL. All you’d need to do in that instance is disallow /sort/ or /filter/ with the x-robots meta tag. Since this directive supports regular expressions (regex), you can disallow multiple parameters or folders from crawling and indexation.

Previously, webmasters used a robots.txt file in this way. However, it’s important to note that as of September 1, 2019, Google no longer supports robots.txt files with the noindex directive.

BE WARNED: your URLs must be pristine and consistent for this to work. Otherwise, you might unintentionally block important pages. Or you might fail to catch all instances of duplication. And it doesn’t necessarily guarantee that your page won’t be indexed.