Welcome to Geeklog Tuesday, September 25 2018 @ 01:19 am EDT

Geeklog Forums

Blocking unwanted requests to reduce server load


Status: offline

Dirk

Site Admin
Admin
Registered: 12/01/02
Posts: 13073
Location:Stuttgart, Germany
Most tips to speed up a site revolve around optimizing SQL requests or throwing bigger and better hardware at the problem. But if you check your logfiles these days, you'll find a lot of non-sensical requests that can also have some quite dramatic negative impact on your site's performance. Here's a few tips on how to deal with them ...

Santy and other worms

The outbreak of the original Santy worm which only attacked phpBB boards was quickly followed by variants (now called Spyki or PhpInclude worm) that target all PHP scripts out there - including Geeklog.

The Spyki worm tries to exploit a common programming mistake in PHP scripts where the author includes another file based on some parameter passed in the URL. The worm simply tries to exploit this with a brute-force attack on all parameters it can find for a script.

Geeklog itself is not vulnerable to this attack (can't speak for all the existing plugins and other add-ons, but I'm not aware of any problems with them at the moment). But the sheer amount of requests caused by this worm can really slow down a Geeklog site.

So what can we do? The idea is to detect the worm's requests on the server before they're actually executed. I.e. we use the webserver's abilities to catch these requests and make sure the PHP script they're trying to attack is not executed. This safes CPU time (for calling up the PHP interpreter) and DB load (for creating sessions, loading the blocks, etc.).

On geeklog.net, we currently use this in the site's .htaccess file:
PHP Formatted Code
# attempts to stop the Santy worm
RewriteEngine On
RewriteCond %{QUERY_STRING} ^(.*)wget%20 [OR]
RewriteCond %{QUERY_STRING} ^(.*)echr(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)esystem(.*) [OR]
RewriteCond %{QUERY_STRING} ^(.*)highlight=%2527 [OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_COOKIE}% s:(.*):%22test1%22%3b
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
 

As explained above, this tries to detect patterns typical to the worm's requests and then redirects them to 127.0.0.1. While it is unlikely that the worm will even follow that redirect, it at least saves our webserver the trouble of having to execute the non-sensical request.

This is not the place to discuss and explain how Apache's mod_rewrite works. Check the Apache manual if you want to learn more about it.

The above rules are derived from similar ones you can find on the web. This site, for example, has similar rules for mod_security (an Apache 2 module) and also discusses some flaws in the above rules (but they seem to work for us for now ...).

Referrer spam

The same approach can also be used against those stupid referal spams. If you check your logfiles, you'll often find requests allegedly coming from porn or mortgage sites. If you look closely, you'll notice that they only send one request for a story or a forum post, but that it doesn't load any images. So it's not someone actually following a link to your site, it's just a stupid bot trying to draw attention to that site.

Again, if those requests come in a lot, they can increase the server load quite a bit. So we use the same idea as for the worms to catch them before the PHP script is even executed:
PHP Formatted Code
# Referrer spam :-(
RewriteCond %{HTTP_REFERER} ^http://.*hosting4u.gb.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*4free.gb.com.*$ [NC]
RewriteRule ^.*$ http://127.0.0.1/ [L,R=301]
 

The two URLs in this example are two from a batch of sites that are being used in referrer spam at the moment. They are probably gone in a few days or weeks and replaced by others.

Which is the main problem with referrer spam: It's a moving target. So most of the time, I'd say don't bother and ignore them. Use the above only if you see a lot of requests and your server's load is increasing because of them.

404s and home-made problems

Some days ago, I was debugging a script and wondering why it caused SQL requests even after it had collected all the data it needed to display the page. The reason was that it was trying to load an image that wasn't there. And because I had set up Apache to use Geeklog's 404.php for the "404 Not Found" error message, it called up that script every time it couldn't find the image.

In case you're not aware, you can set up your own 404 page in a .htaccess like this:
PHP Formatted Code
ErrorDocument 404 /404.php
 

So now every time a file is not found, the error message comes nicely wrapped in Geeklog.

However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.

Hope these tips help someone ...

bye, Dirk

Status: offline

destr0yr

Forum User
Full Member
Registered: 06/07/02
Posts: 324
Quote by Dirk:
Hope these tips help someone ...

bye, Dirk

Indeed they will. Thank you Dirk. Happy New Years!
-- destr0yr "I love deadlines. I like the whooshing sound they make as they fly by." -- Douglas Adams

Status: offline

THEMike

Forum User
Moderator
Registered: 25/07/03
Posts: 141
Location:Sheffield, UK
I have a new version of my HTTP_REFERER module for geeklog pending, just ironing out a couple of details. But, it integrates to spamX to detect referer spam and ignore it for logging purposes. Additionaly, I have an alternative config value for spamx action, rather than sending the default one from your config.php it sends it's own configurable value. Added to that I have a custom action for spamx to perform a die(); command when spam is detected and thus stop you wasting a single further CPU cycle on HTTP_REFERER spammers.

Status: offline

bcbrock

Forum User
Chatty
Registered: 04/02/03
Posts: 64
indescribable
Obviously, the prudent thing to do is implement whatever preventative measures we can to ensure our individual sites are not affected - at least, not affected much. I wonder however, if a GL site that requires user logon to view or post content ( $_CONF['loginrequired'] = 1 ) is somewhat protected from Santy and other like worms???
~Brian

Status: offline

Dirk

Site Admin
Admin
Registered: 12/01/02
Posts: 13073
Location:Stuttgart, Germany
Quote by bcbrock: I wonder however, if a GL site that requires user logon to view or post content ( $_CONF['loginrequired'] = 1 ) is somewhat protected from Santy and other like worms???

Depends on what you mean by "protected".

As I said, a Geeklog site can not be infected by this worm. However, since that worm stupidly tries to call each and every PHP script it can find, your site's performance can certainly be affected. Login required or not - calling up the script, doing SQL requests only to display a "you have to be logged in" error message will have a certain impact on the load of your webserver and database.

The method described above tries to ease that load by deflecting the worm's requests.

Anyone remember the Code Red worm that attacked Microsoft's IIS webservers some years ago? It couldn't infect Apache webservers, but still caused server loads and annoying logfile entries there. As with the Santy worm, there's nothing you can do at your end to actually stop those attacks (other than taking down your site ...). All you can do is to try to minimize the impact it has on your site / server.

bye, Dirk

Status: offline

machinari

Forum User
Full Member
Registered: 22/03/04
Posts: 1512
Quote by Dirk:So now every time a file is not found, the error message comes nicely wrapped in Geeklog.

However, that also means that every 404 causes the execution of a PHP script and requests to the database. So if you have a lot of 404s, you're creating a lot of load for both your server and your database. Check your logfiles regularily and try to fix those 404s.
yes yes, 404s were being returned for most of those nasty worm's requests. so scripts were running and the db was taking a hit (cuz i was using gl's 404.php). just now implemented some of the above rewrite rules... looking forward to positive results. thanks Dirk.

All times are EDT. The time is now 01:19 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content