I've been searching for how exactly to block some dynamic urls from the Googlebot. The research bots for Yahoo! Slurp and MSNBot utilize the same or virtually identical syntax to block powerful urls. As an example I've this one collection in my own htaccess file that allows me to utilize static pages rather than dynamic webpages but I found occasionally the Googlebot will nevertheless crawl my dynamic webpages. This may lead to duplicated content which isn't condoned by the major search engines.
I'm trying to tidy up my personals web site since it currently ranks nicely with Yahoo however, not Google. I really believe MSN Live offers similar algorithms to Search engines but this is not scientifically proven at all. I only condition this from my very own personal encounter with SEO and my client's sites. I really believe I've found some solutions on ranking nicely with Google, MSN and Yahoo possibly. I'm amid testing right now. I've managed to rank nicely on Search engines for a client's web site already for related keywords. Anyway, here's how to block the powerful pages from Google making use of your robots.txt document. The following can be an extract of my htaccess document:
RewriteRule personals-internet dating-(.*)\.html$ /index.php?web page=view_profile&id=$1
This rule, in the event you're wondering, allows me to generate static pages such as for example personals-dating-4525.html from the dynamic hyperlink index.php?web page=view_profile&id=4525. However, it has caused issues as right now the Googlebot can and contains "charged" me with duplicated content. Duplicate content will be frowned upon and leads to more focus on Googlebot because right now it must crawl extra pages also it may very well be spammy by the algorithm. The moral is duplicated content should be prevented at all costs.
What follows can be an extract of my robots.txt file:
Disallow: /index.php?web page=view_profile&id=*
Spot the "*" (asterisk) sign by the end of the next line. This just informs the Googlebot to disregard any number of figures in the asterisk's location. For example, Googlebot shall ignore index.php?web page=view_user profile&id=4525 or any number or set or characters. Quite simply, these dynamic pages will never be indexed. You can examine to observe if your guidelines in your robots.txt file will functionality correctly by logging into your Google webmaster handle panel account. Unless you have a Google accounts then you should just make one from Gmail, Adwords or even you’ll and Adsense get access to the Search engines webmasters tools and handle panel. If you're desperate to achieve higher search positions then you must have one. After that all you need to accomplish is end up being logged into your gmail, adwords, or adsense accounts with an account. They ensure it is pretty simple to create an accounts and it's free. Click on the "Diagnostics" tab and the "robots.txt analysis tool" link beneath the Tools section inside the left column.
By the real way, your robots.txt file ought to be in your webroot folder. The Googlebot checks your site's robots.txt document once a complete day and it'll be updated inside your Google webmasters handle panel beneath the "robots.txt analysis tool" section.
To check your robots.txt document and validate if your guidelines will functionality correctly with Googlebot after that simply enter the url that you want to test in the industry "Test URLs from this robots.txt document". I added the next line to the field:
/index.php?web page=view_profile&id=4235 (please be aware that I did so add the main of my site to leading of this url. It had been omitted since it has been in violation of articledashboard).
I QUICKLY clicked on the "Check out" button in the bottom of the page. The Googlebot will block this url given the problems. I believe it is a better solution to block Googlebot instead of utilize the "URL Removal" device which you might use. The "URL Elimination" device is on the remaining column of one's Google webmasters handle panel. I've go through in several cases in the Search engines groups that folks have had issues with the "URL Removal" device.