# pkc 20120111: Google typically redownloads the robots.txt file every 24 
# hours or after 100 visits (whatever comes first). Using google's webmaster 
# tools I found that Google allows the leading wildcard (*) so we don't need 
# to specify the full leading path.  There are two sitemaps specified here:
# http://www.mosaic-industries.com/sitemap.xml for the root website excluding /embedded-systems/ and
# http://www.mosaic-industries.com/embedded-sitemap.xml for the /embedded-systems/ subdirectory
# These sitemaps have also been submitted through Google webmaster tools.
# http://www.mosaic-industries.com/embedded-sitemap.xml redirects to
# http://www.mosaic-industries.com/embedded-systems/sitemap?do=sitemap
# which returns a zipped file named sitemap.xml.gz
# While the standard specifies that the first matching robots.txt pattern always wins, Google and Bing 'Allow'
# patterns overrule a matching 'Disallow' pattern if the 'Allow' pattern is more specific (a longer string).
User-agent: *
Sitemap: http://www.mosaic-industries.com/sitemap.xml
Sitemap: http://www.mosaic-industries.com/embedded-sitemap.xml
Disallow: */cgi-bin
Disallow: *.php
Disallow: *.cgi
Disallow: */tpl_
Disallow: */ns_
Disallow: */hidden_
Disallow: */_template
Disallow: */_export
Disallow: */__template
Disallow: *?do=
Allow: *?do=sitemap$
Allow: */css.php
Allow: */js.php
Allow: */indexer.php
Allow: */?do=show_contentpages$
Allow: */tpl_index$
Allow: */embedded-systems/doku.php
# User-agent: msnbot
# User-agent: bingbot
User-agent: Aboundex
User-agent: adidxbot
User-agent: AhrefsBot
User-agent: BaiduSpider
User-agent: dotbot
User-agent: Ezooms
User-agent: Industrial Interface Web Crawler
User-agent: IstellaBot
User-agent: MJ12bot
User-agent: NextGenSearchBot
User-agent: Python-urllib
User-agent: Sogou
User-agent: Sogou web spider
User-agent: Sogou spider
User-agent: Sosospider
User-agent: WBSearchBot
User-agent: discobot
User-agent: Mail.RU
Crawl-Delay: 120
User-agent: ShopWiki
User-agent: Nutch
User-agent: BotMaster
User-agent: imbot
User-agent: 008
Disallow: /
# 20120304: if imbot and ShopWiki persist in rapid crawls, disable them in .htaccess too
