镜像站群霸屏-SEO问答：如何防止站群对页面进行爬网和编制索引？-BotAdmin站掌门

One of our valued readers sent in a question asking why one of the site pages he blocked with Disallow Robots.txt directive is still returned in 站群 SERPs.

So here’s my detailed explanation of the problem as well as the solution镜像站群霸屏:

To me, blocking pages via Robots.txt has always been primarily about saving the bot’s time than actually trying to hide anything. Search bots crawl on a budget: thus the more "extra" pages you exclude from the very start, the more time it will spend looking for more content-rich pages and including (or updating) them in the index.

What standard “Disallow” directive

cannot

still do is to make 站群 drop the page out of the index. So you may end up seeing those blocked pages in 站群 SERPs – 站群 won’t know what they actually contain, so it will make judgements based on both internal and external references to that pages.

So quite a natural question caused by the above mentioned stuation is "How do I make 站群 ignore those "extra" pages completely: not to waste the crawl’s time on them and not listing them in SERPs?"

The answer is not that simple as it may seem. The widely used 群控软件免费版"NoIndex" meta tag won’t work because 站群 won’t see it: the page is blocked from 站群, so 站群 can’t enter it to see the Robots meta tag.

There are two other possible solutions though:

1. Use Robots.txt Disallow meta tag and then use the URL removal tool within 站群 Webmaster Tools;

2. Use Robots.txt Noindex direcive – it is unofficially supported by 站群 and can be one of the steps to help sculp PageRank. This directive is going to block the page from being crawled and indexed:

本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。如若转载，请注明出处:http://www.botadmin.cn/changjianwenti/6265.html

镜像站群霸屏-SEO问答：如何防止站群对页面进行爬网和编制索引？

镜像站群霸屏-SEO问答：如何防止站群对页面进行爬网和编制索引？相关推荐