velu liked web (programs, article, news, etc): Robot.txt

Robot.txt is a text file which is used to instruct the search engines for how to crawl and index the pages on your site. The search engines should be obey the robot.txt. The search engine follow the instruction specified by robot.txt file. If you want to some of your pages are not crawl by search engine, then you can protect that pages in robot.txt. Then the search engine don't crawl particular pages specified by your robot.txt file.

The search engine comes to your robot.txt file before it crawl to your pages. It must be placed in your main directory. Otherwise, it can not able to find robot.txt file in your server.

Syntax:
User-agent: *
Allow: /
where,
User-agent represents the search engines.
Allow means the robot.txt file allow the all search engines to crawl all of your contents.

Disallow the all crawlers by robot.txt:
User-agent: *
Disallow: /
Now, the search engine don't crawl and index your pages.

Disallow the particular folder by robot.txt:
User-agent: *
Disallow: /sample/example.php
Now, the search engine don't index example.php file in sample folder.

Robots meta tags by robot.txt:
It control the individual pages present in search result. It should be present within head tag.
For example,
<meta name="robots" content="noindex">
The above line instructs the search engine to don't show this page in search result.

Disallow the particular search engine by robot.txt:
If you want to restrict particular search engines don't index your pages, then you can use robot.txt to restrict it.

Syntax:
User-agent: Googlebot
Disallow: /
Now all the search engines index your pages except google. Googlebot is the search engine of google.

Source: http://www.phponwebsites.com/2013/12/robottxt.html

velu liked web (programs, article, news, etc)

Friday, December 20, 2013

Robot.txt

No comments:

Post a Comment