Previously, robotparser implemented old pre-standard specification which nobody uses now. It returned the result after finding the first matching rule, which worked incorrectly in many cases (see #83368). After #138907 it follows the longest path rule.
The code can be optimized, for example by sorting rules by the path length, matching them from longest to shorter and stopping if the match is longer than the remaining paths. This can only be used for paths which do not contain metacharacters * and $.
Other optimizations can also be used, for example a trie-like structure, which could also be used for paths with metacharacters. But this will significantly complicate the code.
I am not actually sure that such optimization is necessary. In most cases the number of rules should not be too large. This is why I did not include it in the previous PR. We need to collect some data first. So I publish my code as a draft.
Linked PRs
Previously,
robotparserimplemented old pre-standard specification which nobody uses now. It returned the result after finding the first matching rule, which worked incorrectly in many cases (see #83368). After #138907 it follows the longest path rule.The code can be optimized, for example by sorting rules by the path length, matching them from longest to shorter and stopping if the match is longer than the remaining paths. This can only be used for paths which do not contain metacharacters
*and$.Other optimizations can also be used, for example a trie-like structure, which could also be used for paths with metacharacters. But this will significantly complicate the code.
I am not actually sure that such optimization is necessary. In most cases the number of rules should not be too large. This is why I did not include it in the previous PR. We need to collect some data first. So I publish my code as a draft.
Linked PRs