Web.nl operates a web crawling cluster
for the purpose of research and development of a Dutch Searchengine.
Crawler ethics
Our crawler intends to implement general crawler ethics such as
politeness, adherance of the robots exclusion standard and
identification. We intend not to send successive HTTP requests to
the same host more than once every few seconds. We also respect the Crawl-delay directive.
The spiders can be identified with the following HTTP User-Agent
string:
* Mozilla/5.0 (compatible; WebNL;
+http://www.web.nl/webmasters/spider.html)
How do I control which webpages of my website are crawled
Our software obeys the robots.txt exclusion standard, described at www.robotstxt.org and
responds to the agent name "WebNL" and "webnl". To restrict access
to certain location put the following in your robots.txt file:
. User-agent: WebNL
. Disallow: /
If you do not have permission to edit the /robots.txt file on your
server, you can still tell robots not to index your pages or follow
your links. The standard mechanism for this is the robots META tag,
as described at www.robotstxt.org.
WEB.NL
E-mail :
helpdesk@web.nl