Account Balance:    

What is robots.txt? How to add robots.txt on Nginx?

 
What is robots.txt? How to add robots.txt on Nginx?

What is Robots.txt file?

A robots.txt file tells search engine robots which pages it can or can't crawl from a site. Basically list of all pages of a website. Search engines read the list and don't have to look for new pages.


Example robots.txt file

You can see robots.txt file of NameOcean on nameocean.net/robots.txt. It includes below lines.

User-agent: *
Disallow: /recaptcha
Disallow: /tag/
Sitemap: https://nameocean.net/sitemap.xml

It says any search engine may crawl pages except "/tag" and "/recaptcha". Also it says that we have a sitemap on nameocean.net/sitemap.xml.


How to add robots.txt to your website?

If you are using a simple http server which servers every file on a path like /var/www, you can add robots.txt as a text file. We use nginx at nameocean and serve our robots.txt file from nginx. Here our nginx conf file.

server {
listen 443 ssl;
server_name nameocean.net;

location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: *\nDisallow: /recaptcha\nDisallow: /tag/\nSitemap: https://nameocean.net/sitemap.xml\n";
}

# ... other rules

}

To allow all web crawlers let crawl all pages:

User-agent: * 
Disallow: