Membership is FREE, giving all registered users unlimited access to every Acorn Domains feature, resource, and tool! Optional membership upgrades unlock exclusive benefits like profile signatures with links, banner placements, appearances in the weekly newsletter, and much more - customized to your membership level!

Blocking a rogue bot

Status
Not open for further replies.

Admin

Administrator
Staff member
Joined
Jun 14, 2004
Posts
11,076
Reaction score
962
One of my domains is showing a massive bandwidth jump which turns out to be wise-guys.nl search bot hitting the site hard.

2009 Aug 163 MB
2009 Sept 1,483 MB
2009 Oct 1,198 MB
2009 Nov 1,395 MB

Vagabondo 762.75 MB 28 Nov 2009 - 11:55
Unknown robot 409.77 MB 29 Nov 2009 - 04:37

How do I block it?
 
You could try blocking their IP addresses or ranges from your .htaccess file. Full IP addresses block the specific IP, partial (second deny line) blocks that range. You should be able to get the IP addresses of the bots from your log files.

Additions take the form of:

<Limit GET POST>
order allow,deny
deny from 193.49.176.139
deny from 193.49.177
allow from all
</Limit>
 
I noticed a massive incease much likes yours on my bounce rate experiment (and learned something about awestats counting this data not just making you aware). Where bots are doing 36k+ hits a month totalling over a 1.2gb+.

I put the + marker because I haven't looked in about 5 days but that was approx.

Most of mine is from Google Image search bot by the look of it, they seem to archive a thumbnail of all the graphics but not the whole thing. So they are hammering my bandwidth to get the images :(

Should be able to block it as Ty said with ip range blocking, you could block by identifier but the unknown one wouldn't be covered.
 
Bandwidth is cheap, why block it unless it is taxing your server?
But the Vagabondo bot does read robot.txt so block them in there if you really want to.
 
Don't know how up to date this is, but here's some bot blocking code from a .htaccess file, you'll have to configure YourSite.co.uk:

Code:
IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName YourSite.co.uk


#######kill some bad bots
RewriteCond %{HTTP_USER_AGENT} ^Balihoo [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow

Rgds
 
wise-guys.nl

Try adding the following to your robots.txt file to see if this makes a difference:

Code:
# Blocking WIseguys as sucking all my bandwidth
# Vagabondo/4.0; webcrawler at wise-guys dot nl; [url=http://webagent.wise-guys.nl/;]WiseGuys Internet BV, we provide search technology[/url] [url=http://www.wise-guys]SiteGround Web Hosting Server Default Page[/url].

User-agent: Vagabondo
Disallow: /
 
Last edited by a moderator:
Bandwidth is cheap, why block it unless it is taxing your server?
But the Vagabondo bot does read robot.txt so block them in there if you really want to.


lol you don't run your own servers or a large site then!

  • High CPU utilization
  • Unnecessary Database Queries (more log files)
  • Unnecessary Disk space from webserver log files
  • Unnecessary Disk IO which causes 99% for performance problems IME


First I'd try the robots file to see if it obeys it - if not look up it's IP address and block the range.


For really large sites that are heavily indexed, I tend to use agents from http://en.wikipedia.org/robots.txt
 
lol you don't run your own servers or a large site then!

You would struggle to get much more wrong to be honest.
Admittedly I have scaled back since I sold a part of my hosting business 18 months ago but I do still have a lot of hardware in use along side administrating some decent sized sites. I am still a small fish, just not quite as small as you think ;)

  • High CPU utilization
  • Unnecessary Database Queries (more log files)
  • Unnecessary Disk space from webserver log files
  • Unnecessary Disk IO which causes 99% for performance problems IME
Meh, it can happen but if I get these issues its normally because normal use is taking the server towards its designed limit anyway.
Spec your hardware for the peaks and troughs and a bit higher peak is nothing to panic about.

Plus I did say
Me said:
unless it is taxing your server
and admin was talking about bandwidth.
 
Fair play - I was just “mythed” initially with that comment due to the amount of headaches I've had in the past with bots and other automated querying.
 
Status
Not open for further replies.

The Rule #1

Do not insult any other member. Be polite and do business. Thank you!

Members online

No members online now.

Premium Members

Latest Comments

New Threads

Domain Forum Friends

Our Mods' Businesses

*the exceptional businesses of our esteemed moderators
General chit-chat
Help Users
  • No one is chatting at the moment.
      There are no messages in the current room.
      Top Bottom