Subject: avoid 404 errors due to apache not decoding url

Posted on: 17/08/2014 08:41am
By: remy

In GL 2.1 there is a new and highly appreciated logfile called 404.log.
This file contains all 404 errors encountered (dead links). Some of them are caused by normal users that just typo a real link. Some of them caused by bots scraping incorrect links from somewhere else.

However, I do see a lot of 404's listed that have a encoded link. So '%3F' for the question mark and '%26' for the ampersand. It seems that Apache cannot handle properly urlEncoded requests by the browser.
Than I see a lot of 404's starting with 'RK=0/RS='. This seems to come from Yahoo search results.

The following is a quick patch to .htaccess in order to correct the behaviour of Apache. There is a lot of information on the Internet on the subject. Just search for %3F or 'RK=0'.
The below patch did the job for me.

PHP Formatted Code

RewriteEngine On

# strange behaving bots, these are urls scraped from yahoo (botters scrapping for links, yahoo search link contain RK RS) tenants modification:
RewriteRule ^(.*)RK=0/RS= /$1 [L,NC,R=301]
RewriteRule ^(.*)RS=^ /$1 [L,NC,R=301]

RewriteRule ^(.*)\?(.*)$ /$1?$2
 

Re: avoid 404 errors due to apache not decoding url

Posted on: 26/08/2014 07:02am
By: remy

Now I am seeing other 404's due to the fact that /RK=0/RS=... is appended to the query string. Root cause are dumb bots scraping links from yahoo. Apache is not so smart in dealing with this 'trick'.
The following snippet eliminated the 404's for me:
PHP Formatted Code

RewriteCond %{QUERY_STRING} ^(.*)/RK(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]

RewriteCond %{QUERY_STRING} ^(.*)/RS(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]
 

Just add it to .htaccess

Re: avoid 404 errors due to apache not decoding url

Posted on: 26/08/2014 06:40pm
By: Laugh

For those of you who don't realize it Geeklog does come with a file called 404.php that you can set your server to use when a 404 error is generated (ie when a visitor calls for a file that doesn't exist on your server). This generates a nicer error message than the one the server will produce.

This is the same 404 message that is returned by Geeklog when it gets asked for something that does not exist (ie a story or staticpage).

Geeklog - Forum
https://www.geeklog.net/forum/viewtopic.php?showtopic=95786