Technical SEO

What is Crawl directives?

Crawl directives are the Allow and Disallow instructions inside robots.txt that tell a crawler which URL paths it may fetch. They are matched per user-agent, so different bots can be given different access.

How it works

Each directive applies to the user-agent group it sits under. Specificity matters: most modern crawlers honour the longest matching rule, so a narrow Allow can carve an exception out of a broad Disallow. Order is less important than match length.

Directives accept simple wildcards (* for any sequence, $ for end of URL), which is how you block patterns like *?sort= without naming every URL.

Crawl directives vs the robots meta tag

A robots.txt directive governs whether a URL is fetched at all. The robots meta tag (and its X-Robots-Tag HTTP header equivalent) lives in the page response and governs what the engine does after fetching, such as noindex or nofollow. One gates the door; the other gives instructions once inside.

Why it matters for B2B

Sloppy directives quietly starve AI crawlers. A wildcard meant to block a tracking parameter can also block a whole content directory if the pattern is too greedy. Because AI crawlers are newer and less forgiving than Googlebot, an over-broad Disallow can silently remove you from a model's citable set with no error anywhere in your analytics.

Common mistake

Putting noindex inside robots.txt. Google does not support it there, and if you also Disallow the URL the crawler can never read a noindex on the page itself. Allow the crawl, set noindex on the page.

Go deeper

The technical layer

What is Crawl directives?

How it works

Crawl directives vs the robots meta tag

Why it matters for B2B

Get the next article in your inbox