allowing certain urls and deny the rest with robots.txt

747 views Asked by At

I need to allow only some particular directories and deny the rest. It is my understanding that you should allow first then disallow the rest. Is this right what I have setup?

Allow: /word-lists/words-that-start-with/letter/z/
Allow: /word-lists/words-that-end-with/letter/z/
Disallow: /word-lists/words-that-start-with/letter/
Disallow: /word-lists/words-that-end-with/letter/
1

There are 1 answers

5
methode On BEST ANSWER

Your snippet looks OK, just don't forget to add a User-Agent at the top.

The order of the allow/disallow keywords doesn't matter currently, but it's up to the client to make the correct choice. See Order of precedence for group-member records section in our Robots.txt documentation.

[...] for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule.

The original RFC does state that clients should evaluate rules in the order they're found, however I don't recall any crawler that would actually do that, instead they're playing on the safe side and follow the most restrictive rule.

To evaluate if access to a URL is allowed, a robot must attempt to match the paths in Allow and Disallow lines against the URL, in the order they occur in the record. The first match found is used. If no match is found, the default assumption is that the URL is allowed.