# spinn3r and radian6 indexes feeds, diffs # Radian6 User-agent: Radian6 Disallow: / User-agent: R6_FeedFetcher Disallow: / User-agent: R6_CommentReader Disallow: / #still beta, and crawl bad urls User-agent: VoilaBot Disallow: / #Copyright sheriff User-agent: copyright sheriff Disallow: / #http://code.google.com/p/hcardvalidator/source/browse/trunk/robots.txt?r=6# User-agent: Spinn3r Disallow: / User-agent: Tailrank Disallow: / User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:Tailrank (Spinn3r 2.1); http://spinn3r.com/robot) Gecko/20021130 Disallow: / # not sur it works, but I try... russian bot which is not respectful User-agent: Nigma Disallow: / # advertising-related bots: User-agent: Mediapartners-Google* Disallow: / # Crawlers that are kind enough to obey, but which we'd rather not have # unless they're feeding search engines. User-agent: UbiCrawler Disallow: / User-agent: DOC Disallow: / User-agent: Zao Disallow: / # Some bots are known to be trouble, particularly those designed to copy # entire sites. Please obey robots.txt. User-agent: sitecheck.internetseer.com Disallow: / User-agent: Zealbot Disallow: / User-agent: MSIECrawler Disallow: / User-agent: SiteSnagger Disallow: / User-agent: WebStripper Disallow: / User-agent: WebCopier Disallow: / User-agent: Fetch Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: WebZIP Disallow: / User-agent: linko Disallow: / User-agent: HTTrack Disallow: / User-agent: Microsoft.URL.Control Disallow: / User-agent: Xenu Disallow: / User-agent: larbin Disallow: / User-agent: libwww Disallow: / User-agent: ZyBORG Disallow: / User-agent: Download Ninja Disallow: / # # Sorry, wget in its recursive mode is a frequent problem. # Please read the man page and use it properly; there is a # --wait option you can use to set the delay between hits, # for instance. # User-agent: wget Disallow: / # # The 'grub' distributed client has been *very* poorly behaved. # User-agent: grub-client Disallow: / # # Doesn't follow robots.txt anyway, but... # User-agent: k2spider Disallow: / # # Hits many times per second, not acceptable # http://www.nameprotect.com/botinfo.html User-agent: NPBot Disallow: / # http://www.turnitin.com/ - Plaguerism Checker User-agent: TurnitinBot Disallow: / # http://www.netseer.com/ - LA based startup spider User-agent: Teemer Disallow: / # http://www.WISEnutbot.com - LookSmart Spider User-agent: ZyBorg Disallow: / # LinkWalker - Marketing Co Spider User-agent: LinkWalker Disallow: / # Zeus - Marketing Co Spider User-agent: Zeus Disallow: / # A capture bot, downloads gazillions of pages with no public benefit # http://www.webreaper.net/ User-agent: WebReaper Disallow: / # 192.com crawls all UK websites and indexes contact information User-agent: 192.comAgent Disallow: / #NetinfoBot User-agent: NetinfoBot Disallow: / # Picsearch -indexing pictures from the web User-agent: psbot Disallow: / # Accelobot is a search engine for online marketing trens and emergin technologies. User-agent: Accelobot Disallow: / # IRL-crawler is a Texas A&M research project User-agent: IRLbot Disallow: / # MSRBot is a Microsoft web crawler used to collect data from the web for further study. User-agent: MSRBot Disallow: / # Friendly, low-speed bots are welcome viewing article pages, but not # dynamically-generated pages please. # User-agent: * Crawl-delay: 300 Disallow: /config/ Disallow: /quojapa/ Disallow: /files/ Disallow: /Special:AWCforum/?action=search Disallow: /Special:Browse/ Disallow: /Special:Pages_ Disallow: /Special:WikiFeeds/ Disallow: /index.php Disallow: /MediaWiki: Disallow: /Discussion Disallow: /Utilisateur: Disallow: /User: Disallow: /Talk Disallow: /Trap/DO_NOT_CLICK_HERE