Our Web Crawler

The crawler is the piece of software that walks the web, following links, finding new pages to include in the search engine index.

The CAN Search Engine spider is simple and well behaved, in it's current configuration it will pause for 2-30 seconds between fetching pages depending on how big and complex the pages are. It does not start a new fetch from your site before the first one is parsed and added to the index.

If you notice any problems with our spider on your website please contact us at the following email address.

info [ a t ] cansearchengine [ d o t ] com

Our web crawler obays the robots.txt file and you can block it from indexing all or just a part of your site. The name to block is SBSearch.

Use this code in robots.txt to block access to all files and subfolders of your /secret/ folder

User-agent: SBSearch
Disallow: /secret/

You can block our search engine from all content on your site using the following code:

User-agent: SBSearch
Disallow: /

We do appreciate that you let our robot in to the public parts of your site.

Thanks for your attention,

Simon,
CAN Search Engine