Domain Manage

Blocking a rogue bot

Discussion in 'Domain Traffic / Keyword Research' started by admin, Nov 30, 2009.

Thread Status:
Not open for further replies.
  1. admin Spain

    admin Administrator Staff Member

    Joined:
    Jun 2004
    Posts:
    10,083
    Likes Received:
    115
    One of my domains is showing a massive bandwidth jump which turns out to be wise-guys.nl search bot hitting the site hard.

    2009 Aug 163 MB
    2009 Sept 1,483 MB
    2009 Oct 1,198 MB
    2009 Nov 1,395 MB

    Vagabondo 762.75 MB 28 Nov 2009 - 11:55
    Unknown robot 409.77 MB 29 Nov 2009 - 04:37

    How do I block it?
     
  2. Domain Forum

    Acorn Domains Elite Member

    Joined:
    1999
    Messages:
    Many
    Likes Received:
    Lots
     
  3. retired_member13

    retired_member13 Banned

    Joined:
    Jul 2009
    Posts:
    1,317
    Likes Received:
    33
    You could try blocking their IP addresses or ranges from your .htaccess file. Full IP addresses block the specific IP, partial (second deny line) blocks that range. You should be able to get the IP addresses of the bots from your log files.

    Additions take the form of:

    <Limit GET POST>
    order allow,deny
    deny from 193.49.176.139
    deny from 193.49.177
    allow from all
    </Limit>
     
  4. Skinner

    Skinner Well-Known Member

    Joined:
    Jul 2008
    Posts:
    4,325
    Likes Received:
    81
    I noticed a massive incease much likes yours on my bounce rate experiment (and learned something about awestats counting this data not just making you aware). Where bots are doing 36k+ hits a month totalling over a 1.2gb+.

    I put the + marker because I haven't looked in about 5 days but that was approx.

    Most of mine is from Google Image search bot by the look of it, they seem to archive a thumbnail of all the graphics but not the whole thing. So they are hammering my bandwidth to get the images :(

    Should be able to block it as Ty said with ip range blocking, you could block by identifier but the unknown one wouldn't be covered.
     
  5. jimm United Kingdom

    jimm Active Member

    Joined:
    Feb 2008
    Posts:
    688
    Likes Received:
    13
    Bandwidth is cheap, why block it unless it is taxing your server?
    But the Vagabondo bot does read robot.txt so block them in there if you really want to.
     
  6. accelerator United Kingdom

    accelerator Well-Known Member

    Joined:
    Apr 2005
    Posts:
    7,397
    Likes Received:
    109
    Don't know how up to date this is, but here's some bot blocking code from a .htaccess file, you'll have to configure YourSite.co.uk:

    Code:
    
    IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*
    
    <Limit GET POST>
    order deny,allow
    deny from all
    allow from all
    </Limit>
    <Limit PUT DELETE>
    order deny,allow
    deny from all
    </Limit>
    AuthName YourSite.co.uk
    
    
    #######kill some bad bots
    RewriteCond %{HTTP_USER_AGENT} ^Balihoo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow
    
    Rgds
     
  7. 2dareis2do

    2dareis2do New Member

    Joined:
    Feb 2010
    Posts:
    1
    Likes Received:
    0
    wise-guys.nl

    Try adding the following to your robots.txt file to see if this makes a difference:

    Code:
    # Blocking WIseguys as sucking all my bandwidth
    # Vagabondo/4.0; webcrawler at wise-guys dot nl; [url=http://webagent.wise-guys.nl/;]WiseGuys Internet BV, we provide search technology[/url] [url=http://www.wise-guys]SiteGround Web Hosting Server Default Page[/url].
    
    User-agent: Vagabondo
    Disallow: /
     
    Last edited by a moderator: Feb 10, 2010
  8. DaveH United Kingdom

    DaveH Active Member

    Joined:
    Apr 2008
    Posts:
    593
    Likes Received:
    7

    lol you don't run your own servers or a large site then!

    • High CPU utilization
    • Unnecessary Database Queries (more log files)
    • Unnecessary Disk space from webserver log files
    • Unnecessary Disk IO which causes 99% for performance problems IME


    First I'd try the robots file to see if it obeys it - if not look up it's IP address and block the range.


    For really large sites that are heavily indexed, I tend to use agents from http://en.wikipedia.org/robots.txt
     
    • Like Like x 1
  9. jimm United Kingdom

    jimm Active Member

    Joined:
    Feb 2008
    Posts:
    688
    Likes Received:
    13
    You would struggle to get much more wrong to be honest.
    Admittedly I have scaled back since I sold a part of my hosting business 18 months ago but I do still have a lot of hardware in use along side administrating some decent sized sites. I am still a small fish, just not quite as small as you think ;)

    Meh, it can happen but if I get these issues its normally because normal use is taking the server towards its designed limit anyway.
    Spec your hardware for the peaks and troughs and a bit higher peak is nothing to panic about.

    Plus I did say
    and admin was talking about bandwidth.
     
  10. DaveH United Kingdom

    DaveH Active Member

    Joined:
    Apr 2008
    Posts:
    593
    Likes Received:
    7
    Fair play - I was just “mythed” initially with that comment due to the amount of headaches I've had in the past with bots and other automated querying.
     
Thread Status:
Not open for further replies.

Share This Page